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INTELLIGENT NETWORK INTERFACE DE%a€£ 
AND SYSTEM FOR ACCBLBRATiNG COMMl JNICATION 

5 The present invention relates generally to com|juter or other networks, mi more 

particularly to processing of information commnnicated between hosts sach as coropnten 
cormected to a astworic 

The <int4ges of itt\\ otk co'tpmn^ cjre tnrrei^>ngl> ox idv*nl The conveuience and 
10 efficienLV of rirnvidtng tnmnrii t cor nun!^attv~^n or ct^mputiUonai power lo ndjvidiiali «st 
C ti fix. i if iU<(' •! f ^ ' ! i^v! !.shis e*'^ o r.^p!d j-wv-Sh ol such lemork 

i!s w vl w r n oi^* ne'w ork oonpster comn m<ca*5on { a^^comf lisixd Ah the 
aid ot a la>ertd soft v*- arc architecture tor mox-ing 'nfonr,x.Jion bct\^£t.n host computers 

15 connected 'o 'he neworK i he kvers* htip to icgrcgaie .nionnatjon mio mandj^eablc 

segments, the ^agarai functions of each layer often based on an ifttematioaal standard calkd 
Open Systems Intercomeetbn <OSI). OS! set5 foifth $even processing layers through whteh 
information may pass when received by a host in owier to be presentable to an end user, 
Siniiiady, transmission of information liom a host to the network tnay pass through those 

20 seven processing layers in reverse order. Each step of processing and service by a layer may 
i{K>I«de copying the processed information. Another reference model that is widely 
implemented, called TC?/IB (TCP stands for transport control protocol while IP detiotes 
internet protocol) essentially employs five of the seven layers of OSI. 

Networks tnay include, for instance, a high-speed bas such as an Ethernet coimection 

25 o an mtcmei tornettion between di^j>ararc focaJ 'xr-? niJ^\oik'^ (I 4Ns) each ol wnjch 

hoM^ Woid % *o t'se ( !u > s '^tU net". 05}. a' 

th^nUwork A data h ik K-'^ ct ^ 11 £ ^-fl'-sv'! aw! 0 .uhin^t ihtf dau ' nk 

30 ia\trspro\tding f-amedu'sio i !c ^ i i reccncd ^or "he ph ii 

]?su\ iswt'ltf p'-oce&sin^ at. ov ^ ^_ n s^nf Adereccnsng lost \Be>.\vort 

u (. 1 I u t.(.jK je^^ecuvedati hiiKia\es e network Iryers ortmanK 
tontroihBg auc ai^d coordination of subnets of oackcts of data. 



1 



A tsatKsport layer m serviced by mck aetwoik layer and a session kyer is servfced by 
eaei? traBsport layer withm each host. Traasport layers accept data fmm tbetr respective 
sesstott layers md split th& data mto smalJer mits for tmssniission to the other host's 
ismspofi layer, which concalen^es the data for presentation to respective preserttatioR layers, 
5 Sessioti layers allow for eishanced commaaication control between the hosts, Presesiiation 
layers are serviced by Iheir respective session layers, the preseijtaJtion layei^ traasMng 
between data semantics md syntax which may be peculiar io each host a»d stodasidized 
structares of data representation. Coj»pression and/or encryption of data may also be 
accomphshed ai the presen!auon kn ol ApplicLtson b)'e3? ai-^ i;cn iccJ by respective 

10 prestintatiofi iavcr-s, ;Hc apphoat{>">:-i ia-<.'?;5 :rai5s: umi: berwccn prnj^satns particubf to 

individual lio^ti! ar.d s^indj^rd-^cd \rn- p'-ciX-nJ.in'-i to (.'.ther ars <ippi'C.iiif>n an end 
us4.*r Th," TC^ IP oUsiJajd jnc.uaes the io'Acr tour layers and iipphoatu'm Uvers, but 
snwgu^tAs fhc Uj«ctEor,s ofses.ian byer* -and preiicntauon {ayt;n> into adjacem liivsrs. 
GeneraH) SipeakiiSij, appiicasioti, pn;senJa<ion and ses'siun iaycrs <ut.' tk fined as upper layers, 

15 while transport, netvi'ork and data link !a>eis aw denned a-^ hw^r layers. 

The rules md cotivetitions for each layer are cailed the protocol of that layer, and 
siitce the ptotocois md genml fanctson$ of each layer are roughly equivalent in various 
hostSf it is tjsefuf to think of coimnnmc-atton occnrring directly between identical layers of 
different hosts, even though these peer layers do not directly communicate without 

20 mformatton transferring sequentially through each layer below. Each lower layer perfon»s a 
senfice for the l^'er immediately above it to help with processing the commuiiicated 
ijiformation. Bach layer saves the information for processmg and service to the next layer. 
Due to the muUtpUcity of hardware md software architectures, devices and programs 
comnjonly employed, each layer is necessary to insitre tltat the data can make h to the 

2S mteuded destination in the appropriate fonn, regardless of variations in hardware md 
software that may intervene. 

n orcpa* I g d* 1 5 tor tr<. x - ^ i^n t n ^ r t ■> ^wOhd ho* t sorK (.o > <, i d t -i k 
addca ^cavUla^'e^ oifheii5,tho-^tri,a:crd!nL ^jcn-f^t^Xf *tha. U t*t.t.oitio data'itrj 
mdjV'n^ut b sble 'TO'' «hr- r, i ; p„ i id) a *^ -"^i is*. n^tbit liu 

30 apphcatioii la\er ra^h s ^ jpp ->» h c * ^ ^ ( ^ K ^onhnvd 

J Ua ibe orij'-cj U lun !as v*- 4 i ' i.,n I'n>, host h r v. n v. hv. o n^^uKd da*,i <Jtt 
on . am ^JJ a pr,; <,n'4it <■ r ho do to the data, re«; lUmK mhh itx comhmca < s odiJ! a 
Hie daia resulung fronj combination of payload aata, apphcation header and preseiitation 
header Is tl>en passed to the session layer, which performs required operaiiosis including 
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attaching a sessiou header to the data and preseming the resulting combmatlott of data to the 
tcampoit layer. This process continues «s the iiifotmaUos mo\m to lower layers* witJ? a 
transport header, network header atid dm Jink header md ttmhr attached to the dm at each 
of those layers, mih each step typjcally mcludtsg data moving aad copjing, beibre sending 
5 the data as bn packets over the netwotik to the second host, 

TJie tec £-1% »ng hoi>t ».t'ivra h p«;rfonns> ilie converse of the alK>vc-iici.cnhed pux e^s 
1"-^^muing -M'h !0-\\\ ' * 'mis oj r « r t tHO^\, i~ * cxie < <irt. rt moved iod dau 

tt<')-'srui<S!on 'o i.ieshi njon ->H\( i >.f" ngh^<;? 1 arh Lnrr of she !Ci,e!%!n^ iio->t 
U) ! u ca x< HI manipiila*^; onty the headers a&sci. a^ed u m thai L'' er, sri^e lo iha ko, cr {h<. 
h^^,h^.r t.r cun'a^l data is I'^luded with and mdisungmsha^^ic irom the pa\io<,d dat^' 
Mu.tjpk nucuupto, vaiuablc ccntrai processing unit (CPL) processing tnne and icpeattd data 
copies may also foe necessary for the receiving host to place the data in an appropriate ioms at 
its intended destination. 

i 5 The above description of la^^fered protocol processing is simplified, as college-level 

textbooks devoted primarily to this subject ^e available, such as Computer Networks, Third 
Edition (1996) by Andrew S. Tanenfeaum, which is incoiporaied herein foy reference. As 
defined in iiiat book, a computer network is m inteiconaected collection of autonomous 
compiJters, such as internet and intranet devices, iuciuditig local area netwotits (LANs), wide 

20 area netwoits (WANs), asynchronous transfer mode (ATM), ring or token ring, wired, 
wireless, satellite or other xas&m for providing communication capability between separate 
processors. A computer is dellned herein to include a device having botit logic and snemory 
functions Jor processing data, while computers or hosts connected to a network are said to fee 
heterogeneous if they function according to different operating devices or eommunjcate via 

25 different architectures. 

\i- n { <.0'ks ^ . !ctei^»i>j,'\ p -^f^!* i X d *nc i ^ .--n it oi uommunjcated thercb> 

jtKRMsen L is ^.s'jmated du' <t ktge tat iOn ni tat. puxtss^^^ xn«.^r ot j no<.5 CP' 'i a\ H 
dco ed *o Lonvcol ptvkK ^1 prix esses dimsi i^h nt. tht b)5n> « xlm TP t*! pctfonn 
30 oth ru,\'^ Netfto-kinle-tictcjrdsh^ut.hea de% eloped to ht. In vith*h lowest layers, 
iu^rx d« the ohvMv.ai aiid Oata hi k layers It s also poswbic to mcrca<!e p'otocol processing 
speed by siatpiy adding more processing power or CPUs according to convcntioital 
airangements. This solution, iwvever, is both awkwasxl and expensive. .But the complexities 
presmted by various networks, protocols, atishitectur^, operating devices and applications 



generally r«q«ire extensive processing to aftotd coxrsmumcation c:^aMH{y bemeeu varioijs 
network hosts. 

The cuitent in^'ention provides a device for processing network communfcat jon ihat 
5 greaUy increases the speed of that processing the efficiency of traiisferring dm beiiig 
com.m«mcated. The m^'emion has been achieved by questiomtig (he long-siasidijig practice 
of performijig niiiiUlayerec! protiX^;;; prooess.irig oi5 a gcner<-!l-ptsq:».>s,e processor. The protocol 
prx-jcessing method and architecture 5h?J resisifs effectively collupsfs; (he !:jyers of a 
connection -bascd^ layered architecture such as TCP'TP into a single wider layer which is able 
10 to setid network data more d-rccdy to and from a desired location or buffer on a hoat. T\m 
accelerated processitig h provided to a host for both trarjsmuting and receiving data, and so 
iinproves perfonoiince. whether ooe or both hosts involved io m exchai^ge of information 
have such a feature. 



\ 5 a given message that allow data &om the message to be processed via a fa$t-path which 
aocess«s message data directly at its sotirce or delivers it directly to its intended destination. 
This fast-pMh bypasses conventiontai protocol processing of headeis that accompany the data. 
The fast-path employs a specialiased microprocessor designed for processing aetwotk 
communication^ avoiditig the delays and pitfalls of convent ionai mftwaxe layer processing, 

20 such as repeated copying and intemtpts to the CPU. In effect, the fast-path repiaoes the states 
that are iraditionaiiy fonnd in several layers of a conventional network stack with a single 
stale machine encompassing all those layers, in contrast to conveniionai ntles that require 
rigoro«s difeentktion and sepsration of ppotocol layers. The host retains a sequential 
protocol psxKesshig stack which can be employed for setting up s fast-path comiecUon or 

25 processing message exceptions. The specialized microprocessor and the host iotelUgemly 
choose whether a given message or portion of a message is processed by the microprocessor 
or tlie host stack. 



30 computer havtag a communication-processing device for acceiemting network 
communication. 

FIGs 2 is a diagram of information tor the host of F1G< 1 in processing network 
communication, Iticluding a fast-path, a slow-path and a transfer of cotjnectton context 
between the fast and slow-paths. 



The accelerated processing iitcludes employing represmtative control instractions ibr 




FIG, I is a plan view diagram of a device of the present invention, including a host 
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FIG. 3 is a flow chart of message tecetvmg according to the presettt inveaJion. 
FIG. 4A is a diagram of mfonaation flo%v tor the host of FIG. i receiving a message 
Ijacket processed hy the slow-path. 

FIG. 4B is a diagram of isforniatioii Sow for the host of FIG, I receivli^g m initial 
S message paektat proce^ed by the fast-|>ath. 

FIG. 4C is a diagram of iriformation jfJow for the host of FIG- 4B rcsoeivmg a 
subsequent message packet processed by the fast-patb. 

' FIG. 4t> is adiagrsm of informatios flow for the host of FIG. 4C recehing a message 
packet having m error that causes processing to revert to the slow-path. 
10 F!G. 5 is a diagrsm of iafonnalion ilow for the host of FIG. 1 transmitting a ttiessags 

by either tin fast or slow-paths. 

FIG. 6 !3 a diagram of infonBalion ilow for a first embodiment of an inteiiigent 
aetwodc inter fees card (INIC) associated with a cHeai having a TCP/IP processing stack, 
FIG. 7 is a diagram of hart.iware logic for the INIC embodiment shown in FIG. 6, 
1 5 irscludiRg a packet control sequcocer and a fiy-fey sequencer, 

FIG. 8 is a diagraiii of the ily-by sequencer of FIG< 7 lor aaaiyztag header bytes as 
they are received by the INIC. 

FIG. 9 is a diagram of infonnation flow for a second embodinaent of an IMC 
associated mth a server having a TCP/IP processing stack. 
20 FiO> IQ is a diagram of a cotmnand driver installed in the host of FIG. 9 for creating 

and controUiog a comnaumcation controf block for the fast-path. 

FIG, U is a diagram of the TCP/IP stack and corranand driver of FIG- 1 0 configured 
for NetBIOS commtjnications, 

FIG. 12 is a diagram of a communication exchange between the client of FIG. 6 and 
25 the server of FIG. 9. 

fiO I t<^adt sjnm * dM.<,f rut ■> « aii-aidtd n the INK oi FIO ^ 

FlC 1-^ mcluam r'^pl a th a p'-o^t^ 05 m t. vhphii.t 

riG i a u i rs'^-^vctt -r v!or^> sirot lO il 

30 HG l^B siSiuii * in ^h<^^ t f v. m ^oprf^^. ^ or ^f llG U 

t-IO l^C iJi uin ii n i t I II troprttessoro hKt 14 

HG i6tN<iJn jaiVo Id^Jn t lu. u ^to mt that irteraa wrth the 
microprocessor ot FIG. 14 ;md jiichide SRAIvi a«d DRAM 

FIG, 17 is a diagram of a set of status registers for ti^e queues storage uiuts of FIG, 16- 
5 
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FIO. IS is a diagram of a queue master, which mteracts, with the qmm stot^ge 
uiiits amd stattss regssters of HG. 16 ««i FIO. 17, 

FIGs. are diagrams of varitms stages of a !easl-rec8a%-«se<J s-e§bt«r that is 
employed for allocating cache memory. 
5 FIG. 20 is a dia^am of the devices used to operate the Iea"5t~recs«tiy-i3seid register of 

FIGs. 19A-D. 

' FIG. i shows a host 20 of the present mveKlion connected by a network 25 to a 
remote i>.:>st 2 2 The .n^ j e i .t> n p roc v^-^ing ^pt ed xoh^ e^ ed by the present invention can be 

10 pro\Kl<;d wu'i a \ i uclhgcr.t t twork intcri<iC<i (INIC) that is casih and atTord^iblv added 
w ji e\is:ir.g hoot, ot wi:>i a coni:tKinic.rii'n ort.'ccssin,i dv x ^ v ) j \4t li, infect <ir-"c mo s. 
r.o>>t, in enhcr ct.sc re^iag the hONt CPU fa m mo«;i p!orv.ol pto(,i^s^ jng .^k < Uov. u-i^ 
iinprovcnRnls m other i^s>ks performed by that CPL Tie boi.t 2( in a tir^i embodsmcnl 
contains, a CPU 2S and a CPD 10 comiected by d PCI bus ?3 Hie <1^D M> mciude.s a 

15 microprocessor designed for processing cort^munication data ant^ memors btilTei » contiorcd 
by a direct m^ory acc^s (DMA) unit. Also cormected to the PCI hm 33 is a stor^e device 
35, such as a semiconductor memory or disk drive, alotjg with any related controis. 

Referring additionally to FiG, 2, the host CPU 2S controls a protocol processing stack 
44 housed in storage 3$, the stack jnciadii^ s data Ih^k layer 36, network layer 3S, transport 

20 layer 40, upper layer 46 and m upper layer interface 42, The upper layer 46 nsay represent a 
session, presentation and/or application layer, d^ending upon the particular protocol being 
employed and message conimunjcated. The uppar layer interface 42, along with the CPU 28 
and my related controls can se«d or rehi€:ve a file to or &om the uppo^ layer 46 or storage 35, 
as shown by arrow 4S, A eosnection context 50 has been created, as will be explained below, 

25 the context su nmarizing ^.^lious fcati res oftiic coiincct on ?uch .is p)ot<xoi t_> pe ^nd sot rce 
and desttnation addresses i.<v! il lr\e! ftic co jt>?\f ni Se p,iis^>(t SvK^.vn 

interface for the sessior. .a, si- 42 a u e ( VD y\ as >ho\^ n arru\s s <2 ana '^4 a kj hiox^d 
asacoinniuniCationcorJro bifvkSJw^ atcmerCPl TO 05 storage 3f 

W \cn the CPD 30 biid^ a «, CB uefm.ng a partjcu'ar coun.xtion, data rcocs%ea •• \ the 

30 CPD from the network at d pt-rtainm,- to thi, connettton i? tcferencsd lo Out C <" B ai^d casi 
then be ^.n* directb, to storage 35 according to a fas: oath 58, b\'passi. g scqucntisi piriioco} 
PTOJts-s.ig bs ,ht; data i nk 3t), aci vork 38 and transpn't 4*~ nwis r'atixmutnig ,i mevSsa^e, 
such as sending a file from storage 35 to remote host 22, can also occur via the fast-path 58, 
in which case the context for the file data is added by the CPJD 30 referencing a CCB, rafeer 



Ojan by sequmoaily addmg feeaders durmg processing by the transport 40, sietwork 38 and 
m& link 36 layers. The DMA controHers of the Cm 30 perform £hc«e traasfes between 
CP1> and storage 35. 

The CPD 30 coIJa^Jses multipie protocol stacks each having possible sepstrate states 
5 into a single state machine for fast-path processing. As a result, excqstiott cotiditions may 
occur timt are not provided for in the single state machine, pritaarily because such conditions 
occur infirequeatiy and to deal wiJh them on the CPD would provide little or no performance 
hemfit to the host. Sndi exce^tioiis can be CPD 30 or CFU 28 initiated, Aii advantage of 
the invention inclades the manner in which nnexpecled situations tbat occisr on a fast-path 

1 0 CCB are handled. The CPD 30 dcijls with these ram situations by passing back or flushing to 
the host pnjtoco! stack 44 the CCB and any associated message fraines involved, via a control 
negotiation. The excqption condition is then processed In a conventional manner by the host 
protocol stack 44, At some later time, usually directly after the handling of the exception 
condition has completed and fast-path processing can resume, the host stack 44 hands the 

15 CCB back to the CPD. 

This ialiback ci^pabOity enables the perlbrmmce-lmpacting fiinctic4ts of the host 
protoaals to be handled by the CPD network microprocessor, while the exceptions are dealt 
with by the host stacks, the exceptions being so rare as to negligibly effect overall 
performance. The custom designed netv<fork mictoproc^sor can have independent 

20 processors for transmitting and receiving netvifork information, and further processors for 
assj^ing and queuing, A preferred microprocessor embodiment includes a pipelined trio of 
receive, transmit and ntUity processors. DMA contmllers are Integrated into the 
implementation and work in close concert with the network microprocessor to quickly move 
data bet^'ecn buffers adjacent to the cotitt^llers and other locations such as long term storage. 

25 Providing buffet* logically adjacent to the DMA cotttfollers avoids unnecessar>' loads on the 
PCI bus, 

FlC dugra r5> t gcncal flo\v of mc<;;»age$ '•eccn od accordni^ *o h^ ^.lUicnt 
i ^^*msoj Vlatgc ICP ^'"nifc;>&j„t u^ha-, f V rv. tcr m u 1 e je<.>. %ed th ho<-tt.oo 
tbcnc-rv.ork n i umber o* -ip r=k sppiOn dt ^ <4 k.B trnsi^r^ tua^'i it w'^xcl ma^ Sre 
30 sphtmtoiwn ..pproxn^ it" x KB i «i o p ^.Kv wo 'raxi ju-m n ovv.r i nt nork 
Ncv .i .\u\\ <i t rro'Ovf. '<u K r lag '-^.tuKTU^d Pv ^ktt J a ^gv Pr<.>tu< ol <J>P\} or 
NetWa,^ t. 0 ^ ^ ro ocoU\r^^ I ««;{<ietvsn{k P s ke Evcluiuc ^IP"^ > vs< rk n snjuar 
lashion. AnotiKJr ronn ol data commnti3catjon whjch can be uandlee by the fast-oatl^ is 
Tiansaction TCP {hereinafter TTCP or TTCP), a version of TCP v^'hich initiates a connection 



with atj ifiitisl tjansactjon reqaest after which a reply contajajng data may be mm according 
to the connection, rather than initiating a connection via a sevsrai-mcssage Mtialisatioa 
dis^ogae and then tmnsfatnng da^ with later messages. In any of tlie tramlfen typifi«<i by 
ih&s& protocols, each packet conventioiiaily mclndes a portion of the data being transferred, 
5 as weil as headers for each of the protocol iayeis and markers for positiomng the packet 
relatS ve to the rest of the packets of this message. 

When a message packet or frame is received 47 from a network hy the CPD, it Is first 
validated by a hardware assist. This includes deiermimng the protocol typ^s of the varlons 
\d 5,Ts -< eni-vms? e}t\ mi c i<,cI'vS.uit?'> '^d si mm m/inq ^"^ thv<;c in dings into a <5tatus %ord ot 

50 \o' 's hi I vied sn thvs<:. nord i^tiA J t. rso 5 \^ Imh&t or \ni t c h m c a .^nJjdatc ^b"- 
tiv. pmu-. t (k «. ii<;kc%oi ■^^oni'. pi \<,.n^K itMsh^-^u a^\<.Hlhu Oil. dnJi i 
t>aafit'j fn d t Ki>^ t*- o JiKttioii 'un^ h<-f d i^d tht ( PD whu * i Kiink*- ^t. !jni)i!n<^ 
x^hrthtr '■fp-jtkf.thiNi^in.c h s h catrnt, ^■?!-'i>.u a. pjotomls <=)Js b as 1 CP JP or 
SPX IPX for example lh<^ -^mali per.,cn frimci thit .rs, net ta-~t h c mdid^atcs an, st-nt 

IS t>i to the host protocol vacks tor <jk %v path protocol processing Sab>vqu>,rt -^ct^sork 

imcropfoces&or'vvork wuh 8a».h «ast-path candidate detemiin(,"J wheihei a Uht path co.mectson 
such as a TCP or SPX CCB is akeady extant for that candidate, or whether that candidate 
may be used to set up a new fast-path cotmection, snch as for a TTCP/IP transaction. The 
validation provided by the CPD provides acceleration whether a frame is processed by the 

20 fast-path or a slow-path, as only error free, validated frames aure processed by the host CPU 
even for the slow-path processing. 

All received message frames which have beeit detenttit>ed by the CPD hardware assisst 
to be fast*path cat>didates are examined 53 by the network microprocessor or INIC 
comparator circuits to detemtine whether they match a CCB held by the CPD. Upon 

25 conftnrang snch a match, the CPD rcmo\ e<^ lo%\ er aver head*, rs and sends 6*? the remammg 
application date lTi.m the Ta'nc ' * h ns m i .on in the hc>t asjug dir^ict 

memor> access (DM A) i» x I P > s i>ncr...t is o<. vur jmmcdiaieis up<. n 
receipt vK x3 Ri«,«.sii?c packet f-^r e'^am^'iie %shi;u TCP vA^n'^Ch^n alroadv t* and 
dcstK*a{!usj bi iter hastb^tf newt>atesi o' stmas first bt j't^tSSvir, topro.e sat until 

30 hcidcj V acqiij'^ i nen -.t,' t\ ansfu^no ^ addiesses tot di;^ ins.^ U tass hncr c i 
ttte C ^^ J 1 s. lUUv subst ^iK It nti>'^4Li. " k %a a1 sk w ,ist)ns^ tot t\ ^ d^sti \ no ^ uldrt 
and th;;n T}^\ \ t^»(, oum^u <p -ilKCtjon dsf^ 'o hat dei.n!rt}on 

A tast-path candjdate diat does not match a CCB may be used to set up a :»ew fast- 
path connection b\ icnding 65 the frame to hv* ho%t for sequential protocol proteasing In 



tMs case, the host uses this fm«e to cteate 51 a CCB, which ss then passed to tbe CPD to 
control subsequent fi^ses on timt cosBectjoft, The CCB, wMch js cached 6? tn the CFD, 
tnckdes control and state informatioa pertment to ail pmtocols that wo«M haw been 
processed had conventionai software layer processing beeti emptoyed. The CCB also 
5 contains isalorsge ^ace for per~traasfer mfontiation used to facsiitstc moving appHcation-level 
data contained within stibssquent related message packets directly to a host application in a 
form available for immediMe tjsage. The CPD takes command of eoaneetioa processing 
upon receiving a CCB for that coniicction from the ho^, 

\s sho \t> tror; •>p«.<,}^c<i v m FIG 4 \ uhcr mtssagv pa kct is recened fron the 

Hit. tt tlen^vsu tv^iJ !i ! sU us no g Ih )k id ^ fh i i id wosd h -Jipfs ^lii, 
memors 60 Vsv^ell is \ -si id tf > th'^ p ic^ '^t *ht. rtt u b^a 2!rd5cate^v th the son 
whe^hc* £f &pav/ke s caic< t.'afc tor p«h pro>.f<;<^ HG 4A. dcpict\ the ca-.c sri 

15 t K pt cku lb not fast pj h c muKiite m v> hid\ ca^^e tht CrD "0 sends thv. vahdatod htadtr& 
and data from memory 60 to data link layer 36 along an internal bus tor processing by tbe 
host CPU, as $bown by arrow 56. The packet is processed by Use host protocol stack 44 of 
data link 36^ network 38, transport 40 sad seseion 42 layers, and data (D) 63 fiom the packet 
may then be sent to storage 35, as shown by arrow 65, 

20 FIG. 4B, depicts the case in which the receive logic 32 of tbe CVt> determines that a 

message packet is a cmjdidate for fast-path processing, for example by deriving from the 
packet*§ headers that tl^e pacJtet belongs to a TCP/IP, TTCP/IP or SPXjIPX message, A 
processor 55 it* the CPD 30 then checks to see whether the word that summarizes tlie fast- 
path candidate matches a CCB held in a cache 62, Upon tlnding no match for this packet, tlK 

25 CPD sends the vabdatcd packet ikom memory 60 to tbe host protocol stack 44 for processing. 
Host stack 44 may use this packet to create a cotntcction context for the messages including 
fi.jdmg and reser\mg a des.tma»on fot data nom the mr*;sage .i<?sociated v\5«h 'he .lacRcJ, thv, 
i.ontf'* t tajuxig the f^mx oi , (. C 8 I *>e pre - n c ^ tH>>. vx ent cm •^]os ^ a >aigle <;pectdIt/od 

30 <-■' V t '^sc t>c OH ^-^st n iu_i ^^rOvt'-^cJ a d tt^^fer.t Uus" :>tack th^n 
rofi fs,\{-niih c^tidsiUte^ isom^ J a« (Dl > oo f.oni fiai Jii u pat kcf u u optjon^dh bo ^ont 
t ^ tne ucs'jtiaf Of r ttouii^ '^'^ i-ho^'^n bv amw 1 8 Thm CCB u tien bent the CPD 30 
to be saved in cache 62, as sliowa by arrow 64, For a traditional comiection-based message 



9 



$mh ^ typified by TCP/IF, the mhi&i packet may be part of a connection jnitialisatlon 
dialogise &at Iraijspires between hosts before the (XB is created passed to tlie CPD 30. 

Referrmg now to FIG. 4C, when a subsequent packet from the iS«tne coimection as the 
MtM packet k received from the nelwotk 25 by CJ^D 30, the packet headers a^^d data ate 
5 vaUdated by the teceive logic 32, md the headers are parsed to create a sasBmary of the 
message packet and a hash for Snding a coftespondmg CCB, the stimmary and hash 
contaiBed m a word or words. The word or wosls «c temporarily stored in memory 60 along 
With the packet. The processor 55 checks for a matcb between the hash and each CCB that is 
stored in the cache 62 mut finding a match, sends the data (D2) 70 via a fast -path directly to 

IQ ihe destination in storage 35, as shown by arrow 72, bwassing the session layer 42, trsnsport 
layer 40, netwoilc layer 38 m\d data link layer 36. The remairung dais packets from the 
message can also be s«at by DMA directly to storage, avoiding the relatively slow protocol 
|»yer processiag and repeated copying by the CPU stack 44. 

FIG. 4D shows tl>e procedure for hatidJiitg the rare instance when a message for 

15 which a fast-path coi>nection has been established, such as shown in FIG. 4C, has a packet 
that is not easily handled by tlie CPD. In this ease the packet is sent to be processed by the 
protocol stack 44, which is handed the CCB for that message from cache 62 via a control 
dialogue with the CPD, as sltown by arrow 76, signaling to the CPU to take over processing 
of that m«ssage, Slovif-psth processing by the piotocol stack then results in data (D3) 80 from 

20 the packet being sent, as shown by arrow 82, to storage 35. Once the packet has been 
processed and tfee error situ^on corrected, the CCB can be handed back via a conttel 
dialogue to the cache 62, so that payload data from subseqttent packets of that message can 
again be sent via tiie fest-path of the CPD 30, Thus tlte CPU aad CPD togetlisr decide 
whether a given message is to be processed according to fast-patii h8rd%vars proces-slng or 

25 more conventional software pi ocessirjg by the CPU. 

Tran.sni's.<5K)n of a nicssaiit.- from ihc h-st CO to the nmvork 25 for delivery to remote 
host 22 tdso c^n be processed by either scqiiciUiaS protocol sofiware processing via the CPU 
or accelerated hardware processing via (I'i") as .^iiown in FIG. 5. .■'V nscssagc (\Si ^>0 
that ts selected by CPU 28 from storage 35 can bo sent to .soission iavcr 42 tor proccismii by 

30 stack 44. as show ti hv arrows 92 and 96, For the situation in wi^ich a connectioij e,Kust$ and 
ihe CPD ?0 aireatiy has an aprroptiate CCB f<2r the message, however, data packets can 
bs'pasp host stack 44 ;ind be .^ora by DMA directly to men-iorj- '>0, wuh tht- proces-sor 55 
adding to each data packet a single header containing all the appropriate protocol layers, and 
sending the resulting packets to the network 25 for transmission to remote host 22. This fast- 



path »sn«miss!Ofi can gi«at!y accelerate processmg for even a single packet, witib tiie 
acce1mti<m mukipiied for a kiger mmsge. 

A message for which a fasi-palh coniiecHon is not extant thus may benefit torn 
creation of a CCB witli ^propriate contix>l and state mfonDatioi? for guiding fast-path 
5 tra«smj$sjon. For a tradlt ioftal cormecttotV'based message, such typiHed by TCP/IP or 
SFX/IPX, the CCB is created during connection initiaiixatiotj dialogue. For a qaick- 
cojsiectio^ message, such as typified by TfCP/IP, the CCB can be cheated with the same 
transactiott that transmits payioad data. lit ttiis case, the transmisstoK of payload data may be 
a reply to a request that was used to set up the fast-path ootHiection. la any case, the CCB 

30 provides protocol and status inforrtmtion regarding each of the protocol layers, including 

which user is invoivs4 and storage space for pcr-tracsfer infomiatiou- The CCB k created by 
protocol stack 44, which then passes the CCB to the CPD 30 by writing to a command 
register of the CF0, as shown by arrow 98. Guided by ihe CCB, the processor 55 moves 
tsetwork franie-sized ponions of the data from the source in host n^emory 35 into its own 

1 5 memoj-y 60 asing DM A, as dej)ieted by arrow *>9, The processor 55 then prepends 

appropriate headers and checksums to the data portions, and transniits the resuittng frames to 
the network 25, consistent with the restrictions of the associated protocols. After the CFD 30 
has received an ackaiowledgcmeat that all the data has reached its destination, the CP0 will 
then nottfy the host 35 by writing to a response buffer. 

20 Thus, fast-pa^ transmission of data communications also relieves the host CPU of 

per-frame processing. A vast majority of data transnassions can be sent to the network by the 
fast-patit, Bo^ the input and output fast-paths attain a huge reduction in intetrxtpts by 
functioning at an upper layer level i,e>* sesston level or higher, and mteractions between the 
network mtcroprocessor and the host occur usmg tl>c Ml ti^i.isler i>v c s hsth tn&t upper kvcr 

25 Wishes to make F or ia!>t~path t ommMntca ionv ao s nemjpl onlv oc ur% c^t iht, mo t > at the 
befmsm0 ajid rt *. -> i 1 1 s > ^ ^ t i ^ o ii d there ire ro nrerrupt-s 

fi>r the sendirg or tcvu if it,-: v 'a^v -> y v-^puci-eto iha*. irjufeuaion 

\ sinphtivd inte hsiem na'.voO v v'-t < ^ ^ i.l\K > 1 ^0 \s ^ho sn m f Ky (-^ to 
prosidt a n^t %ork irtctacc tor v. host -^2 il^ri^v^^ > k k c i~ < ! tis" 1 -0 ts <. 'j hc 

30 to ircn^o >v 1^^ wsd t pt-nolera t>aj»i^Pri) i*^" c-orm^vnn., >i it'^IC «iiJho^t Pulo i, 
i '>2 1 1 this *>r-!boa n-"ent has a TCP IF protoco' "^tack \ hscn pro% h*c\ i<^\n iiith * * ^ <i r 
sequent^ '■n H-j-c processing of message names reccnco turn thv n tAcsV l*^-^ tn-^h^t 
152 protocol stack includes a data link layer 160, netwoit layer 162, a tratisport iayer 164 ajid 
an application layer 166, which provides a source or destination 16§ for the commtsnication 

n 



data m tM host 152. Other ky^ which are sot shown^ such m session and preseritatjoa 
layers, may aiso be mdmled m the host stack 152, and the source or des&iattcm may vary 
dependiug upon the nature of tlie data aad may actually be Ifee appKcatJoa layer. 

The INIC 1 50 hjas a aetwork processor 170 wMcJt chooses bemees processing 
5 messages along a slow-path 158 that includes the pmtocol stack of the host, or along a fast- 
path 1 59 that bypasses the protocol stack of the host. Each received packet is processed on 
the fly by hsKdwane logic 171 contained m INIC ISO, m that all of the protocol headers for a 
packet can be processed without copymg, moving or storiag the data between protocol layers. 
Thv h udA Uv. (. 1 7 pa^ce set. ihe headers of a given packet at one L n<, j \ ackti b>ie!s 

10 i'<i<;$ SUjo igh tnv L^'idwar^ bv ^^tt^on/ng selei. eJ header bvtes> Rtsultb o. pjottsMOg the 
elected b^tcs help tt,' Jtrl^-rnKit vs ^ { .er b\ e- of t^^e p^tk<,t at^ i,<.i{vgott ed « itti & 
bimutiaA v't the packet I a^ ocvi-s Cte. icd mvluJsn j t_c\^u -sj v.hJ>itK Tite pr«Xv6sed 
hcpdcrs ^ la Jaw Ir^mi t^v vv. pa^-Ke* jr<, <tK^ r^J m INR s'i raa^, ! v' %\v 1 the 
worJo" \o'a-sumf»\in-!!-»g*h xa^crs am' tatj- ci tn<. "•a.^ka Ua j <, stotage 

15 £^onftsiv.ra jon, the {\.( i ma. ^onnectco to a pc p era storage aevjce sucn as a diA 
dn\e vvhioh hsi» an ID£, SCSI o similar mteriace^ with a file cache for the storage, dev ice 
residing on the memory 185 of the IKIC 150, Several mch network interfaces may exist for a 
host, with each intsfface having an ^sociated storage device. 

Itie hardware processing of message packets received by INIC 150 from network 155 

20 is shown ia laore detail in FIG. 7, A received message packet first mters a media access 
controller 172, which controls INIC access to the network and receipt of packets and can 
pmx'tde statistical infcamatlon for network protocol management. From there, data flows one 
byte at a time into an assembly register 174, which tn this example is 12S hits wide. The data 
is categorized by a fly-by sequencer 1 78, as will be explained in more detail with regard to 

25 FIG, S, whtch exmnmes the bytes ofa packet as they fly by, and generates status ftom those 
b>tes that v, ili be used to summarize the packet. The status thus crsiateJ is merged ith the 
data by a muKiolcxos SO and the resulting data stored t:\ SRAM lh2 A packet coiitroi 
seijuerxer ovcrsvic? ihc tl\ bs &taa?!^ccE P^, exam.nci: nfcmiauoa t)05r tS'se media 

30 '''4 vrt^^ m< stf.nyn ofua, i I'or- u ^--..""^ % r.'p^ er "^-i SR>\t ' 82 a\d f\entuaii> 
DR 4V1 i T5» pa< It i tor.iroi \ maiuigcs a hufkr ii, SR.\^ 11 82 s la SR 

cojiticJic) .aul Abo jrdtcatui. to a DR4\} co'itroiler iSo vsii'ii d,Ua nanh to ,>c rti.vtd 
from J>R A.M 1 S? w a btsffci in DRAM I S8 Once data mo-i emviH for the packsJl has. bi-en 
coriipleted and all the data has been moved to $Ii© buffer in DRAM 188, the packet control 
12 
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seqiteacer 176 will move the status that has beea generated in the fly-by seqae»cer i 78 out to 
th& SRAM 182 aud to *e beginning of the DKAIvl I SS butler to be prqjended to tbe packet 
data. The packet control sequencer 1 76 t)en requests a queue maiKsger I S4 to ent^ a mseive 
buffer descriptor into a receive queue, which in t«m notifies the processor 170 that the packet 
5 has bees processed by hardware bgte 1 71 and Its status sunanarized. 

FIG. 8 shows that the %-by sequencer 1 7$ has several tiers, with each tier generally 
focusing on a pattictiiar i>ortk>n of the packet header and thus on & particular protocol layer, 
for gsnei^tjng status petlatning to that layen The fly-hy ssquerjcer 178 in this embodiment 
iocfodes a media access control sequencer 191, a network sequencer 192, a transport 

1 0 sequencer ! 94 and a session sequencer ] 95, Sequencers pertaining io higher proioccl iayers 
cm\ additionally be provided. The fly-by sequencer 178 is reset by the packet control 
sequencer 176 and given pointers by the packet controi sequencer that tel! the Sy-by 
sequencer vs?hether a give-o byte is available from the assembly register 1 74, The media 
access cotttroi sequencer 19J determines, by looking at hytss 0-5, timt a packet ts addressed 

IS to host 1 52 rather than or in addition to another host. Offsets 12 and 13 of the packet are also 
processed by the toedia access control sequencer 191 to determine the type field, for example 
whiter the packet is Bthemet or S02J. If the type field is Etiiemet those bytes ako tell the 
media access control sequencer 191 the packet's network piX>tocol type. For the 802.3 case, 
those bytes instead indicate the length of the entire fiame, and the media access control 

20 sequencer 191 will check eight bytes further into Uie packet to deteimine the network layer 
type. 

For most packets the network sequencer 192 validates that the header lengtli received 
has the correct length, and checksums the network layer header. For fast -path candidates the 
netw'otk layer header Is known to be W or WX &om analysis done by the media access 

25 control sequencer 191 , Assuming for example that the type field is 802,3 and ;he neiwork 

protocol is IP, the network i.eui:c;;cc; i9:- analyztis the tlrsi byies of the network layer header, 
which ■mW begin at byte 11, -n order so ycierr.<;ae IP type. The in%\ bytes of ti-e IP header 
will be processed by t;^e r.etx^-orh s;:iii!:?:K:;M l'>2 to detennitje what IP ty]yii the payket 
iuvo)v'ct5. Dctennining that the packe; involves, for example, IP vefsioTi 4, directs fiirther 

30 ptocessitig by the network sequencer 192, which aiso looks at the protocol type located ten 
bytes into the IP header for ar) indication of the tran-sport header protocol of the pac:kct. For 
exaniple, for IP over Bthemet, the IP headar begins at offset 14, and the protocol ^c byte is 
offeet 23, which will be processed by network logic to determine whether the transport layer 
protocol is TCP^ for example. From the length of the network layer header, which is 
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typically 20-40 bytes, network sequencer 192 detensm^ the bsginaiag ofiM packet's 
transport layer header for vaHdating the transport iayer header, Trasspott seqaeacer 194 may 
generate diecksanss for the transport layer header md data, whkli may mclade mformatton 
from the IP header m tfce case of TCP ai least 
5 Continuing wMv tlie exiimple of a TCP packet, tosjfssrt seqjieticef i 94 also asaiysjes 

the first few bytes m the iratTisp(»1 layer |K>rtion of the header to determine, t« part, the TCP 
source and destination ports for the message^ such as whether the packet is Net B»s or otha 
protocols B>tc 1 2 of the TCP header is processed bv the transport sequcncor 194 to 
>1ettmin)C vahdate t.ic TCP ht\idcr Icngin Byte i3 oi the I f H heade? cent isrs f\\^$ that 

10 ma\, <iiK'e trom aok fl'igs. ano pmt thgs ."dica'C .irexpccrea opt\>rs, -luch a-, '■csvii and fm, 
that nn\ <:< liSt- fie pitoct'b?or to oat^jgon . h p^t a i<-^ v\<. epUi^n ; T'p off- lo 
and i7 aje tijc checksum, ii> pul'ea out and ^torea ine lurdv-are ict-'so 1^1 ^vs^iSe tfie 
rest of the &ame is validated against the checksum. 

Session sequencer 1»>S deterrnmes Uie length of the seisjon laye? header, svhtch ui the 

\ 5 c-5«;e of NetBjos is oitly fot« bytes, two of which tell the length of the NclBios payload d^ta, 
b«t which can be much larger for otlier protocols. Tlie session seqnencer 195 can also he 
ased to categortste the type of message as read or write, for example, for which the fast-path 
may be parttcukriy beneficial. Further upper layer logic processing, depending upon the 
message type, caa be performed by the hardware logic 171 of packet control sec|«eiicer 1 76 

20 and fly-by sequencer 178. Thus hardware logic 1 71 inteUigently directs hardware processing 
of the headers by categorisation of selected bytes ftom a single stream of bytes, with the 
states of the packet being built from classifications determined on the fly. Once the packet 
control sequencer 176 detects that ail of the packet been processed by the fly-by 
sequencer 178, the packet control setjneneer 176 adds the status information gesierated by the 

25 Oy-by sequerscer 178 and any status infonnation generated by the packet control sojuencei 
176, aiK! propeiuL (add^ to the fronf} ihat siaius ^nforrnation to ihe packet, for convfuk-iice in 
handiing the packet b\ mc liu-^cx-HQi 1 70 Th^^ additional status informafio!! gcnetaicd by the 
packet coritrol sequencer ] 76 ir.ciudt;? ip.cd\:i .i^cc^^ coiiiroiler 1 72 s?at»s infomiation a»d any 
errors discox'cred, or data overflow \n either the asscmbiy register or DR.4.M buffer, or other 

30 njssceiianeou,*! it- formation regarding ihe packet, lite packet control sequencer 1 76 aiso 

stores entries into a receive bnffer qaeus and a receive statistics queue via the queue manager 
184, 

An advantage of processing a packet by hardware logic 1 71 is the packet does 
mtf in contrast with con%-en{ionai seqaeatia! rn^ue protocol processitig, have to be stored, 
14 



moved, copied or palied from storage for procsssiug each protocol kyer header, ofFering 
drasttsUc racreases in processing gfficimcy md savings in processmg lime for each pacfeet 
The packets cm be processed at tfee rate bits are received from tJ\e network, ^sr ex^ple 100 
megsbits/secosd for a iOO base! con»eciioa. The time for categorizing a packet receh^ed at 
5 *h s r j!, .d h 1^ ini & length of sixty bytes is» thvis about 5 nscs o«t -.md The total tt ne ior 
on i,^ssi! !t tH^ i-iji^ka V ih the hardwm* iog.c ! 7 1 and &uiJjt3g n Kkt i ^hu k) its ho^t 
ic'«ti'" Ur^ hi. J-sst p f rt n I <.hou U> nmtostv^ nd*- or le^^s sssimn^ < ' 6 MB/ P( 1 
>\ 1 nnvHt (X < \uit m «, pnt ol pKKts'^i^ hv 5, W> MIL Fo tmn Ti'" prov,es^or 

lis t ss.!. ^ mu;_h 2f 0 mic o>t,<.0!M.- n t -^x dc\ tc \U>rc dan an ou o nu.,nttudv. 

i 0 t. vi.aa'-v. it ''u">cvN^irs t r e ca 1 *hu be a^htCvcd %\uh ti<-t pa*b 5 ^ m compartsof) \^ m ti 
I '„h picd CPU xmpio\insi »,umuUion.il sequentiJ ofts^arc p ot xol »^ KL^^if^j, 
UvU oni-tn !5% tht d anat c act^ cation p-ovidc . h\ prv>c^.\ 5ng ll ^ p tec li htt'ders b% tlit 
aardn o,» c PI ajiJ t-oc(.<- o' ''l v^nhcute vj uon 5a<; iiLt^e Kidususn U'^e^vrngs 
afforded by the reduction in CPU mtettupts and host bus bandwidtii savfegs. 

1 5 The processor 1 70 chooses, for each received m^age packet held in storage 3 S5, 

whetiier that packet is a candidate for fast-path 15$ and, if so, checks to see wheihssr a 
fast-patli has already been set up for the connection that the packet belongs Co. To do this, the 
processor 170 first checJcs the header status sumniary to detennine whether the packet 
headers are of a protocol defined for fast-path catididates. If not, the processor 170 

20 commands DMA controllers in the INIC 150 to send the packet to the host for skfw-patfe 158 
processing. Even for a slow-path 15S procsssiug of a menage, the INIC 1 50 thus perfott«s 
iaitisl procedtjtes such as validation and determination of message type, and passes the 
validated message at least to the data link layer 160 of tiie host. 

For fast-path 159 candidates, the processor 170 checks to see whether the hsader 

25 status summary matches a CCB held by ti^e INIC. If so, liK dsts from the packet is sm\ 

along fast-path 159 to the dest!uau<>r. I^S sn snc host, if tht fasT-path 159 candidate \s packet 
stutintary does not match a CCB held by the INK', the ^ACikat iray be sent to the host 152 for 
slow-path procesbitut to cry;uc a CCB for mes.<iage, Hmployifieitt of the fast-patii i 5-> rnay 
aiso not be needed or desinihle for tfi« ca.se of fragmented messages or other complexities. 

30 For tlie vast majority of messages, however, the INIC fast-paih 1 5^ can greatly accelerate 
message processing. The INIC 1 50 thus provides a single state machine processor 1 70 that 
decides whether to send data directly to its destination, based upon informatio» gleaned on 
the ily, as opposed to the conventional employment of a state m^hine in each of several 
protocol layers for determitHng the dcsUny of a given packet. 

15 
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I» prcxiesismg m mdicatiou or packet recei^'ed at the host 152, a protocol driver of &e 
host selects tlie proeessiag routs based upon whether the indicatioa is fast-paOi or slow-path. 
A TOVIP or SPX''0»X message has acojmectioa ihal is set iip from which a CCB is formed 
by the driver and passed to the INIC for matchirig with atid gttidiag the fasi«patb psckiet to the 
5 comectksa destjtiation 168. For a TTCP/IP message, ttie driver can create a comection 
context for the trattsaction from processing an imtiai request packet, mcteding locating the 
message c'csMwttop I6S, ai<d then passing that context to the IKIC m the form oi a CCS ior 
proMdtn^ a a t-path icr j i^pij r»-om fhat dc'^tmation \ CCB ncludcs c^nncc^ion ard <5ia e 
mici n vt son rvij.a'-t. i ig thv^ pro'ooos a c>^ rnd pac-^els oi the masb^^^: Ihun a CCB ca i 

IF or IPX K.ddre-i&S's., >■ s 5 t r I*. P i "s' \ po-^ TCP a^its :suc i a- miers, 

recess e arid ton 'ntt^MWiO'^'^ o ^-Jiasns? vniios^ p-otot-o ^ and mfomutt.ori ndicai.ng tiK 
session ^aver protocol. 

Citthrtg the CCBj> s j a hash table m .he INIC proMde'? qujck i-cmp-^nsoni* v. nh ^m<ii> 

1 5 6uniir at w% ncomjrg packets to daemuj\e whttlier the p^kcb cati be pa^cessned v a the 
fast-path 1 S% wUh the &1I CCBs «re also held in the IHIC for processing. Other ways to 
accelerate this comparison include software processes such as a B-tnse or hardware assists 
such as a contest addressable niemory (CAM). When INIC microcode or comparator circuits 
detect a match with the CCB< a DMA controUar places the data from the packet in the 

20 destination 168, without any intemipt by the CPU, protocol processing or copying. 

Depending upon the type of message received, the destination of the data snay be jhe session, 
presentation or application layers, or a file bnffer cache in the host 1 52. 

FIG, 9 shows an INIC 200 connected to a host 202 that is employed as a file server. 
This INIC provides a network' interface for several network connections employing the 

25 b'02 3u rd .otimw^d Inmn as Fa^t Hthemet Tlie IMC 200 Ji tom^ecied bj < PC 
bu 2'^*: louts rvti m i u uns TCP IP o! SPX IPX p otOxoi c\ i> U ') <^ 

MAC Uver 21^ t t.n> Oi 1 nl! 2!' s ^)\p>i-* h " a 'U ai'pS i. i on Inu ^ri> ^tha 
source ftes mat '^t 2 ^ho\<. r h i ^ *he a n u o ^ it lOugD is nicr tton(,a c< '''jer thi. 
apphcat or I «-v.r cdn he th- soi r<. c vir *. tirUion i he I\K J-o (. ^mcct^a to fKtsvork 

30 I nes 2 * <\ 240 in. i 24 ^ v hich arc prctcnJbl^ Fast i ihcrrt t ^ttd ir - oj: tiv 
coaxrai (,able or other I nes. ea^h alio-fting data tran n ission ot 1 00 Mb <> Vi hde ia&ter md 
slower data rates are also possible- Network lines 210, 240, 242 and 244 are each coitnected 
to a dedicated ro%v of hardware cimuits which can each validate and summarize message 
packets received fe>m their respective network line. Thus line 210 is connected with a first 
16 



hojtimntal row of seqamcsrs 250, Hne 240 ss coimected with a secortd horizontal row of 
sequencers 260, line 242 is comected with a third honzonsal ra%v of sequesscers 262 aiid line 
244 is connected with a foiMfe jiorizoatal sow of seqa«ncets 264. After a packet has been 
validated and summarized by one of the hodzoivtal lismJwsre rows it is stored along wjtli its 
5 status summary in storage 270, 

A network processor 230 determines, b^ed on that summary and a comparison with 
any CCBs stored in the INIC 200, whether to smd a packet along a slow-path 231 for 
proe^sing by the host A iaarge m^ority of packets can avoid such seqaeatiai processing and 
have their data porttoas sent by DMA aiong a fast-path 237 directly to the data destination 
10 222 in the server according to a matching CCB. Similarly, the fast-path 237 provides an 

av^ne to send data dIrecUy from the source 222 to any of the network hacs h\ p; ocessor 2 ?0 
Jjv!Sio« of the data mo packets ajid addition of full headers for tjeuvork Snii)^,-tu",>£ ->r. agan 

snowr .cn\e, !i< u<i'! ^ . . 1 1 ho utno.-i rjx\s :50. 2(Jl IbZ a«J 264 affcr? fuU 

IS dupkxcornnMmiv r< >, \^'nu ^u^ucn.vr iwsn The sp«cuiued INIC 200 

is mucn faster ai woikirigwjth rncsssaye packets than ex.^n <id>iaoced gcncral-puipose host 
CPUs that processes those headers sequentially accortling to the sofhvare protocol stack. 

One of the most commonly tjsed network protocols for large messages such as ftle 
transfers is server message block (SM8) over TCMP, SMB can operate in conjunction with 

20 redirecior software that determines whether a required resource for a particular operation, 
such as a printer or a disk upon which a file is to be written, resides in or is associated with 
the host from which the operation was generated or is located at another host con»ected to the 
network^ such as a 0ie serves*. SMB and server/Vedirector are conventionally serviced by the 
transport layer; in the prejsent invention SMB md redirector can instead be serviced by the 

25 IMC. In this case, sending data by the DMA controllers fiom the INIC buffei^ when 

i«ceiving a large SMB transaction may peatly reduce interrupts that the host mast handle. 
Moreover, this DMA generally moves the data to its final destination in the fsle device cache 
An SMB transmission of the present tnvention follows esscntiallv tlse un^-i,- ^f iio vibo%i. 
described SMB receive, with data transfcn-cd from the host to the ^NK md ftouM u. burtstx 

30 while the associatci psot^-^.ol ho::d^'. j-c -r^v>n.toJ -.o c ,n the :NK\ icr tn^nsm'sitoa 
via a ncnvorlc Ime to a rerioie nost I- :o<,c^c tn. .N C .-^^ trsv mui.sp'o packas asul 
multjpk TCP, UK Ne*Bt<^^ m-^ SMB rro o.ol livorv v ^ i^-tor i\rdv,are .nd s\ itiioiu 
repeated interrupts of the host caj) greailv increase the speed of tuasnntta^jj ^ni SMB juosiaut 
to a network line. 

1? 



As shown in FIG, 10, far controlling whefhsr s gives message is proceSv^sd by the 
host 202 or by the INIC 200, a message eopmand driver 300 may fee installed in host 202 to 
work m concert wMi a host protocol stack 3iO. The e-ommimd driver 300 can tntervme itt 
messag*? reception or tr^jsmittaf, create CCBs md send or receive CCBs torn the INIC 200, 
5 so that functioning of the INIQ aside from iifl$jroved |)erfonaance, Is tmts|>aret5t to a aser> 
Also shown is m IMC memory 304 an INJC minf|>ott driver 306» wMch can direct 
message packets received firom network 2i0 to either the conventicsiai protocol stack 3 10 or 
the cdtttRiaad protocol simk 300, depesiding ispon wiiether a packet has been labeled as a 
fest-path candidate Tlje ctinventsonal protocol stack 310 has a data link kyer M2, a network 

10 tayer 314 ar-d ?. iorspon Uyer 31(S for conventional, lower layer procesi;ing of sress^it'Os that 
arc Tiot labeled as rast-patlt cajKiidates and therefore not processed by ihc conmvand stack 
-KiO, Rc\.a <uv-^^f ?bo A-\u'r layer stack ilU ;s an tjpper .ayei ^, wijjoh ten- c-^osiss a 
3f&s;on. pre^cntauon aiKt'C! appsjcatLon layer, depending upon the niess.u-c <-ot-!nniP!cated. 
The conirt^and driver .vJO si{rj!ari,s hc5& a data !uik U>cr 320, a network layer 322 arid a 

15 transport iayer 325, 

The dri^-er 300 includes an upper layer interface 330 tlmt detenuines, for transmtssion 
of messages to the network 210, whether a message transmitted from the s^p^ layer 318 is to 
be processed by the conanand stack 300 and subseqaastly the IMC fast-path, or by tfee 
coaventtotial stack 310. When the upper layer interface 330 receives an appropriate tnessage 

20 &om the ixpps^ hym 3 1 8 that would conventionally be intended for transmission to the 

network after protocol processing hy the pmtocoi stack of the host, the message is passed to 
driver 300. The IMC then acquires network-sisjed portions of the message data tor that 
transmission via iNIC DMA units, prepends headm to the data portions and sends the 
resulting message packets down the wire. Conversely, in receiving a TCP, TTCP, SPX or 

25 similar message packet from the aetwoi^t 2 10 to be used in setting tip a fast-path contiection, 
miniport driver 306 Averts that message packet to command driver 300 for processing. The 
driver 300 processes the message packet to create a context for that messa ,e, %k sth t^c driver 
302 passing tht- cojusjk- -and command msf-icurns back lo the JXIC 200 as a CCH fo: 
sending dala of ^.abiseciitnt rtc6.<;di;es for the »ai"e coi iteoror ak >ng a x,m pA*h i iundrcJx v)*" 

30 TCP, TTCP, SPX or wfi. \u CCB voiv.et mni> nuy bo hek^ iruefinitci) hy tht; IMC, ai.ho«KK 
a Isast -ecently used <1 RV) alvjcrUlini jj, m}p»o>ed for ihc ci^i s\her, the oachs si. M\ 
The dn\er '^OO cz.n r.\o rreatc a connection context ibf a TTCP request whi^.h ii> pa^ ed lo the 
MIC 200 as a CCB, allowing fast-path traasmission of a TTCP repiy to th& reque.st, A 
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message havsag z protocol that js not accelerated can be processed convssiliouaHy by 
protocol stack 310. 

FIG. 1 1 shows a TCP/IP implemejitation of commassd driver software for Microsoft* 
protocol messages. A cot}v«5tionai host protocol stack 3S0 iactodcs MAC kyer 353, IP layer 
5 355 md TCP lay^ 358, A commaad driver 360 works m cotjcert with the host stack 350 to 
process network messages. The commaad driver 360 includes a MAC laye? 36X an fP layer 
366 and an Alacritech TCP f ATCP) layer 373. The conventional stack 350 a»d conuaand 
driver 360 share a network driver interface specification (NDIS) iayer 3?5, which interacts 
■Aith the IN1C rniniport driver 306, The INIC miniport driver 306 ion.s rccei'v e indicaUoni 

10 for pfocessing by either the cojn'entional host slack 3 50 or the ATCP dnvs^r 3<>0- A W\ 

f'drer driver and upper layer sracrisce iinidarly deternunei whetl-er nsessages ^l-jh from a 
TDi user 382 xo xhc network au; d:vt'!Kid to tht^ coriunatid driver and pcjhaps; 'o ihc fast-palh 
of the LMC, or processed by the ho,«.t stack. 

FIG, 12 depicts a typicai ,$MB exchange between a client rK* and .scrxer 2*)0, both of 

I S which have communscation devices of the present invention, tlic cominunication dci'ices each 
holding a CCB defining their connection for fast-path movement of data. The client 1 90 
includes INIC 150, 8023 compHaat data link layer 160, IP layer 162. TCP layer 164, NetBios 
layer 166, and SMB layer 168, The client has & slow-path 157 and fast-path 159 for 
communication piccessing. Similarly, the server 290 includes INIC 200, 802,3 compliant 

20 data link layer 212. IP layer 215, TCP layer 217, NetBios kyer 220, and SMB 222. The 

server h connected to network lines 240, 242 and 244, as well as line 210 which is connected 
to client 1 90. The server also has a slow-path 23 1 and fast-patli 237 for consnonication 
processing. 

Assnming that the client 190 wishes to read a iOOKB file on the server 290, the client 
25 may begin by sending a Read Block Raw (RBR) SMB command across network 21 0 

requesting the first 64 KB of that fite on the server 290. The RBR conisnajjd may be only 76 
b>tes, for example, so the INIC 2Cn o-, -t,r\ tr \% rw<-„n5/e n-se me>«5as" (SMHi mu 
rt'dtne > 5>nia!i pie'!:sa^e s zi^ j.id • em^ t „ 'i •>■> it v >r£ct v > s i *n,.' t ri path to e'Ptos ot 
the i>er^'-cr N^tBins wdl t't e ' u a va 'o s p'oces^.s Fe„d rcquo^l: ^n > teicne> 

30 ibs 64KB d<i{a li to v^-.c- ^i^ h^^^c^^ W^l' x -.K \e Biub lu &.nc ihsi aau ma 
Nc-Bios autoiifs th<. d 5ta or „hent h o cct\ e'lMcsw! h in X^'Bjon hov M k. jH TCP 
o'stpu art! p^is^', (>4 KB lo ICr v^lwhsvoi 'ddnids. tJv dita nt''* bn^ ^'='g£n^='^n^ ai5u 
output each segment via IP ami eventuallv MAC (ssow-path 23 ! ). In ihe present case, the 
MKB data goes lo ihe ATCP dnvo" along wjth an indication regarding 'he cht;ai-server SMS 
19 
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conaection. %vfeich tadicates a CCB heM by the CNIC, The IMC 200 ta proceeds to DMA 
I4S0 byte segmenis ftom the host hufFcsrs, add the appropriate headers for TCP, IP and MAC 
at one time, and send the completed packets on the aetwork 210 (fast-path 237). The IMC 
200 mU njp«at this until the whok 64KB transfer has been sent. UsMJly aSer receivmg 
5 acknowledgmient ftom the cheat that the 64KB bas hem received, the INIC will ^itn sead 
the remainsng 26KB also hy &e fast-path 237. 

Willi INIC 1 50 operating <m the client 190 wh«i this rqily im\ es tJte iTsIC 150 
n;cogm/e& fnom the first frame received that thts cosnectioR $ roccu irg fa i-p&xh 159 
prwcessmg {TCP<1P, VetBjoa. matching a CCB), and the A Rl vr&y uve th & i1r<. tr< me tt. 
10 .-Jcqtme buffej sptcefot tl e j it;sij>age Th s> latter td;j.e Ji. done hv pji.'^irg the fr>t 12^ '^Mes.of 

c \c b50N ^MB a ! 01 * ^ ira c s h.,ddc'-, KaBt^^ SNTB ill n /v. th i . idcr^ 
r^it.^v h^Khr^v *h4r<.q^!, JOt^***>v^ -.^rcpK i I n R xsRii! oncv-to 

15 i£. hv > r- \ ed ahhough mere mav amve whdc tnis> proc^ssnjg js cc<.umi5g As? &oon as 
the client buffer list k given to the ATCP, it passes that trai^sfer info«nation to the IKIC I SO, 
and the INIC 150 starts DMAitig any Sams data that has accamnfated into those buffers. 

FIG. 13 provides a simplified diagram of the IHIC 200, whiah combines the functions 
of a network int^ace controller and a protocol processor in a single ASIC chip 400, Tlje 

20 INIC 200 in Ms embodiment oflTers a Mi-duplex, four channel, 1 0/l 00-Megabit per secorwi 
(Mbp«) intelligmt net%vork interface controller that is designed for high speed protocol 
processing for server apphcations. Altlwngh desigjjed specifically foj- sejver applications, the 
IMC 200 can he connected to penonal contpnters, workstations, routers or other hosts 
anywhere that TCP/IP, TTCP/IP or SPX/IPX protocols ar« being utilized. 

25 The IMC 200 is connected with four network lines 2 1 0. 240, 242 and 244, which may 

transport data along a number of diOsreni conduit.^. ?uc.h a.s tv-;-icd pisr, coASivd c.:bk or 
optical fiber, each of the connecUojis providing a rnodsa iiidepeatlctu tsitorface (MUj v;a 
commercially available physical Isn oi cr-ipi:, h-jch as model S0220/80221 Eihtfrnet Mt;diti 
Intexface Adapter from SEEQ Technoiop>- hicorporafed. -'"nOO Baysidc Parkway, Fremont. 

30 CA 9i53Z. The Hues pieferablv are 8i)2.3 i;omp:i;.TV .sri i .onnecuon wi;h she fN,iC 
co«$titu?c four ccirnpietc EUjcrneUjodes. the ITnIC v:uppon-nti lOBsse-T, 10B3se- T2, 
lOOBase-TX, lODBase-FX and U)0Sase-T4 as well as future interfoce statidards. Physical 
layer identiftcatioii and inittaiization is accomplished tl)ro«gh hosi driver initialisation 
rontines. Tlte connection between the network lines 210, 240, 242 and 244 and the IKIC 200 
20 
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k conlroMet! by MAC nmis MAC-A 402, M\C-B 404, MAC-C 406 and MAC-D 40S wbkh 
contain logjc circuits for perfonning the biisic fiiRcUot^ of the MAC sublayer, essentiaify 
controIltTjg whm ttte XNIC accesses the network Imes 210» 240, 242 and 244. The MAC units 
402-408 may act in promiscuoas^ multicast: or umcast modes, allowing the INIC to fiiaction 
S as a network monitor, receive broadcast and muiticast packets and implement multiple MAC 
addresses for each nods. The MAC units 402-408 also provide statistical mformattoii that 
can be used for simple setworfc tr.aaagmjeat protocol (SNMP), 

The MAC units 402, 404, 406 and 408 are each cotHtected to a transmit and receive 
sequencer, XMT & RCV-A 418. & RC\' B ; ; X\'": .-i: RCV-C 422 and XMT & 

!0 RCV D-i?,4, h\ v, ires 410, 412. ^'.'4 and ^16, rti,;-ccSi%c ^ ^ "ich of the transmjt and receivt; 
sequencers can permmi several protocol processing tJCps ors the fl\ iis ^•!?efi■^age frames pass 
through th^it sequencer, Iv combination with the MAC units, ihc irjiusinu ixnd rvocsve 
sequencers 41 8-422 cjt5 compile Jhe paekel status for the dau iu5k. ntt'.'.otL u an^x^st, 
session anJ, if appropriate, presentation ;«kI application layer prokKo'si m haulv^aic, i::!CAt>\ 

15 i<.*duc;i>; the xnnc for such ntoioco; piocessitig eompaicd to losn erifion,il scqticntKi^, sortsvare 
engines. The transrnii and recejve scqasRcet* 410-414 are connected, by Imcs 42<>. 42S» 430 
mi 432 to an SHAM md DMA cofttrolier 444, which inctodes DMA contmUers 438 md 
SKAM con^iier 442, Static randoia access memoT>' (SHAM) buHm 440 are coupled with 
SRAM coatroJier 442 by iitje 441. The SRAM and DMA controUers 444 interact across litis 

20 446 with external metnofy control 450 to send and receive frames via external metnory bus 
455 to and from dynamic random access mettiory (DRAM) buffers 460, which is located 
adjacent to the IC chip 400, The DEAM bursts 460 may be configured as 4 MB, 8 MB, 16 
MB or 32 MB, and may optionally be disposed on the chip. The SRAM and DMA 
contmilers 444 are connected via line 464 to a PCI Bus Interface Unit (B1U) 468, which 

25 manages the interface beUveen the fNIC 200 and the PO interface has 257, The 64-bit, 

in-iliiplexed BTU 468 provides a direct interfjice to the PCI bus 257 for both slave and nia-ster 
functioris. The INK" 200 :s capable of operating in either a 64-bji or 32-bit PCI cru-irorancut. 
while supporiiug 64-bu acilrcss;ng m either contlguraiion. 

A -nicroproccssoj 470 is connected by line 472 to the SRAM and DMA eoufroilers 

30 444, and cowiectcd via line ^"5 to the PCI BIU 4<'<S, Microprocessor 470 insuiictions and 
register files reside in an on chip control store 48tX which inehtdes a writabie on-chip eontrol 
store (WCS) of SRAM and a read only memory (ROM), and is connected to the 
microprocessor by line 477. The microprocessor 470 oflcrs a programmabie .state machine 
which is csapable of processing Incoming frames, piocessing host eommaads, directing 
21 
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n^csrk tmf fic md directing PCI bus traffic. Tiaree processor are implemewted rning sfeired 
hardware m a fee^ level pipelmsd architecture that launches and e«njpleies a smgle 
iRsoacsiotj for ev^' clock cycle, A jrescdve processor 4$Z is primariiy tised for receiving 
cotmnMfcations while a transmit processor 484 js primarily use4 for tratistaittmg 
5 communicatioM is order to faeiUtate Ml di^lex cotmaursicatiou, while a tttiJity processor 
4U offers various ^ncUons iaciudlng overseeing and coatr oUiag PCI register access. 

The Eastmctioas for the three processors 482, 484 and 486 reside itt the on-chip 
cotitroMore 480, Thus the ihnctioiis of the tlM:es processors can be easily redetined, so that 
the Riicroptx)ces«or 470 cm adapted for a givea environment. For a>s«a«ce, the an^ouat of 

10 processing retjij ired for receive functions may outweigh that required for either transmit or 
utility i\fficlions. In this situation, some receive ftinctjons may he performed by the transnth 
processor 484 arid/or the utiiify processor 486. Aitematively, an additional level of 
pipeiijiiag can be created to yield four or more- virtual processors instead of three, with the 
additional kvei devoted to receive tunctions, 

1 5 The INiC 200 in this embodiment can support up to 256 CCBs which an? maintained 

in a table in the Dlt^VM 460. There is also, however, a CCB t»dex in hash order in the 
SRAM 440 to «ave seqiienttal searching. Oaoe a hash has been generated, the CCB is cached 
ia SRAM, with up to sixteen cached CCBs in SRAM in this example. Allocation of the 
sixteen CCBs cached ia SRAM is handled by a least recently used register, described below. 

20 These cache locations are shared between the tr^smit 484 aitd receive 486 processors so that 
the processor with the heavier load is able to use more cache baffers. Tlxere are also eight 
header and eii^t comnmnd buffers to he shared between the seqyescers. A givm 
header or command buffer is not statically liak&d to a spscific CCB buffer, as the link is 
dynamic on a per-iranie basis. 

25 HG, 1 4 shows an overview of Sie pipelined mictopt^cessor 470, in which instructions 

for the receive, transmit and utility processors are executed in three alternating phases 
acc<»:diag to Clock increments I II ajid III, the phases corresponding to eacij of tht, p\pd n« 
stages Eacft phajse re&porsibie tor difte cnt ttjncDons ind each ol th*. th j^t pit,; ^vi>ois> 
ott. .pie<> a t 'ff'-^'e t ^ht^c... "<.^^.i^^i K.J Pv'c^ie') Cachproccsso ustt^lK oociates 

30 I ^>r?K-^ d ite-K'ii us. -^t t on ' ^' o^-i. 4S0, and each camcs its ov^n 

j> ogf r coiutcj and staiiii u fOui,h ea^i oi hs, p\ tsei 

In gercja? « ius>{ n struciion phase 500 of tb<. pipdmed nhv.ronrocc'^sotN con pie'cs an 
jnstniction mid stores ihe result m a destination operand, fetches the next instrucuon, and 
stores that next iiismictian in an instntction register. A first register set 490 provides a 
22 
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mimber of register iaeMng the jsstt«ction i«g5Ster, md a set of controls 492 for first 
K5g5JStcr set provides the coatrois for storage to tiie first register set 490, Some Jteras |>ass 
through the fjrst phase wjtliottt modification by the controls 492, and ittstead are simpjy 
copied mto the fmt register set 490 or a RAM file register 533. A second ins^mction fxhase 
S 560 has aa instmctioji decoder snd operand mtiltilplexer 498 that genemlly decodes the 
jftstructioa tMt was stored is the instruction register of the Srst register set 490 md gathers 
any operands which have been geamaed, which are th^ stored in a decode register of a 
second register set 496, Ttie fmt register set 490, second register set 496 and a tbiK^ register 
set '=01 V hjch cmpiv^vtd n a third in-^truvtion pn^bi, ntK de nutw of the nic 

10 )£g'stcr; -cis v,v b<,> vui ir the n nre jeui1e<l ^ 't'v.^ b ot '^IGi* \'^\C Th< uiscruvfs < ■ tkLOi'i^t 
liid r<per3.nd mnh plJKt 4^^ c<i,'5 '..J rc-rt. t ^ddu^s atxl dUipoas uS U l R A.Vi I'L 
rt;g ^--.tr \ hkh oncr3'f\ m Kvh d <. r"i>i pi\)i.t ^00 5 ul stxoml ph^st '^o ' V ihi< i i hist 
(, 1(> 0 -^'occssor 4*"^ -t<;i\ .m anthmctif fogtc nmt ? A! I ) (v)!^ wh ch «enct. lis ;f>!,'torms 
i'nv \Ll opes at ons <w the operands rroni the v^xond register i-et, stonng the rssuUs m a 

} 5 rcsuiti. rcg's^ct jnciuJed m tiic third rcpsicr set ^iJl A. sta^k cxvhange b^M can reorder 
register stacks, and a queue manager 503 can arrange queues for the processor 470, the 
results of which ^ stored in the third register set 

The instructions continue with the first ph^e then Mowmg the third phase, ^ 
depicted by a circular pipeline 50$. Note that various functions have been distributed across 

20 the three phases of the instruction execution in order to minimize the combinatorial delays 
wtthin any given phase. Wjtli a frequency m ihjs embodiment of 66 MHss, each Oock 
mciement takes i 5 nanoseconds to complete, for a total of 4$ nanoseconds to compiete one 
mstmetion for each of the three processors. The rotating instmct jon phases are depicted tn 
more detail m FlGs, i5A-C, m which each phase ts shown m a different figure. 

25 More particulariv. FIG ISA show^ some spet-jfjc haniv^ar%j fimcjon-^ oi the tjrs»t 

pnafe '^OO ^'^hichgeneraliN mcijdesthoifrsfc^ -) ^ ' <^\s4^i2 I he 

control'! for 'he tsre* '•eg ste*- eei »-'>2 nicludt^ sF ^ ^ ■> -^i :; ^ s-si ' s a Ofisca! 
voitroi tor io<u-"ni>; iddicss aod «rae ca o ^fl VM sMr^-sb j da u t<,„! ^ir< -2'^ Tiiu. 
iht < visput oft K \i T f^C "Ton 'he (h.'d pnj^e ' C" '^d ht pi v o kM tto '^O'' 

30 uju an adJrt fe>. sier or 1 ta 'cgtstcr ot SR VM addres>s rni c ' s -^CO -V ok 

^.o.jtro. >0 i siin larh nroNsdes con'roK to v-ntinn a oiti r fot a t) v> to nk <.Ame\t i :;^t^ter 
'^22, and at ther loa i <-.ontrnl ^0(> rjox'oe^ comro^ for c^on'ii. ^ s xrict <4 msv Un<, 
data to ilip-fiop registers S25. ALU conditjon codes, stich as whether a earned bit js set> get 
clocked into ALU condition codes register 52S without an operation perfomted in the first 
23 
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phase 500, Flag decodes SOS cm perftsrm vajiois ftjRcUorss, such 8S setting locks, thai get 
sU)t«d in fi&g registers 530. 

The RAM file regsstsr 533 has a single write port &r adtJresses «Bid dm mi two rea4 
ports for addresses md data, so that more tiian oae t^gkter can be read Irom at one tnse. As 
5 noted above, the RAM file register 533 esssntiaJiy straddles the first and second phases, as it 
is wriu«»i i» the Srsi phase 500 and read irom in the second phase 560, A control store 
I'xetniciHni * 1 i > i IK \K s t ,v. 's-, -^'nr r'svi, of the otsi* rs due lo t e s ddid m from the 

J 5 ^ddrcv-s V k\ch ^ddre^s <^o?ed h> it'ch aiMscss jegsste? ^"8 laid ^ot < ol ^^l^ p-ovuks 
instruct on^ \->r .i program cox rfc 540 %v noh opcr.ia"% mssch iskc fhc f-cfol ^ i,khe>>$ fo xi c 
coniro! 4oro \ ia^t-ir '-^^-ot t nack 544 o: lh ee r::g stc^ !S copied to the first register set 
^^:Uhc\u undersong other r^^e at o'ls sn '»i!s pha^e bm.iU> a lo o vontroi "17 foi a debug 
address 548 is opdonally included, which allows correction of errors that may oceur. 

1 5 FIO, i 5B depicts the second micropnscessor phase 560, which includes reading 

addresses and data o«t of the RAM file register 533. A scratch SRAM 565 is written Irom 
SRAM address md data register 520 of the fitst register set, which iucludes a register that 
passes tlnough the first two phases to fee mcremented in the third. The scratch SRAM 565 js 
read by the instraction decoder and operand innittpSexer 498, as are most of the registers from 

20 tlie Srst register set, with the exception of the stack 544, debug address 548 and SRAM 

address and data register mentioned above. The instrucaon decoder and operand nsultiplcxcf 
4^>Jj xOO<<5 at the \ar,otiS registers ot set 4*?0 atxi SRA\t 565, c'ccodcs s i&JuKfK'n-v and 
gaUjenr the operands tor operat on m the n5>.t phase, tn partjcuic^r det<.mi v. g rhe of er<inds lo 
p'o\ id-i to the A.! I t>02 be'ow Fhv ow*co ne ot the .n'-tmction de4.odeT j.na opsi and 

25 'mil'.p i\or !& btozeu to a tn!nv>cr o »et,.c>Vrs m tiic :>etonJ itgsbter 4">6 m^uJuja 
A.1 1' ope itxts "^"^ a ■^Z \L' 5 V, K voio 'v-iu'v' "^f J, and a queue tfta u.t;l anvi 
^.omnunt "^S"^ 'cgistcr, vvt ich !j> u r cjrboc '^ert :.-'n cunro thsrtv two quuie Scsera 
{ht. regtsters )U sei 40o ase Joadod "<tith dtstxth fsoit tio in^tocaoi regs^itt "^35 ibo^e 
iih f 'S sahhtam al dn odtrg by the decodt't 4*)?. ii c UvUs.g a pro js .^n^ ^ onb <. 1 "^^SO < I .ter^ii 

30 tlvK^ *^Nf? xtesJ -^ika ^5^4 a.id a flag select '^.^'^ 0\h^i .cgisti^r- <. i tPe coiiu'N} '^SS 
of the first phase 500 are aiways stored in a file cantext 57? of the second phase 560, but may 
also be treated as an operand that is gathered fey the multiplexer 572- The stack registm 544 
arc simply copied in stack regist^ 594, The program counter 540 is incremented 568 in this 
phase and ^ored in register 592. Also jncremaated 570 is tiie optional dd)ug address 54S, 



wo 08/13891 



and 3 load control 575 may be fed fmm the pipelme 505 at this point in order to allow <si?or 
control m each phase, ths result stored in debug address 59B, 

FfG- 15C depicts the third microprocessor phase 600, which mcludes ALU md queue 
opemiions. The ALU 602 mdwdes m adder, priority ssncoders arsd other statidard logic 
5 funUsons Resuifoof tlie ALU ore stored m registers* AH' oujputol*? AH ( nl tjon ^oot^ 
620 and desundtson oper^aid results 622 ^ file context r"gjs*er 61o f) ij^j '-t^Lv. u v„5^ u o2<v 
atsd literal field Rg 6*tr 630 are sjmph cnp ed from tne prcvjoui phas-t ^60 \ t 
muitmlt <C' 60^ !s p'os sJ d t <. eemime u hethtr a t '^miituw' j us p f 'k n t mx p % nh 

10 ti ■"torn d in She i t ih%sc '^O ^ -^1 v<. Ji >Lrv,^ s n u 1" s Jrtvh omto \ 

or puum4 ^ prottran uo irtcr omc. th t «;?av.k -"tsu ts o \%^ik h i ■>u>ic \ m p Oo.r..m i. mtro^ 
634, DrOi!:rw.r counter and ->t ick 6 tO rej^tstcx 1 he i>R \\l idd ti*- m^jv optionaP v bt 
m^.rem^,51tvd m thiS phase 6U0 Another load control 610 for anomer debug address o42 ma\ 

1 5 be forced from the pipeline 505 at this point m order to allow error control in this phase «lso> 
A QRAM & QALU 606, siiowti togeOier in this figure, read from the queue chamsi ami 
command register 587, store in SRAM mv^ rearrange queues, adding or removing data and 
pointers as needed to mana^ the queues of data, sending results to the test muMpiexer 604 
and a cjueue flags and queue addi'«s5 rcsgister 628, Thus the QRAM & QALU 606 assume the 

20 duties of managing queues for the three processors, a task conventionalJy performed 

sequentially by software on a CPU, the queue tnanager 606 instead providiug accelerated and 
sufestsntislly paralle! hardware quetiiiig^ 

FIG. 16 depicts two of the thirty-two hardware queues that are Jtianaged by tlie queue 
rrtanager 606, with each of the queues having an SEAM head, an SRAM tail and the ability to 

25 <|ueufc mfumwtjon m a DRAM bodv Akm uu v>p u ^jon ^d mdi^jdual conf guratioo 

OxfeacUtJ^M Ihu^FfFO" ^.i^i,Fv ""-^ "^O" 709 atid 711, each 

contau mjt esght h tt, iorM-iU of i: t\>l 5 \ b m tl c nmnbcrt and rapacjtv ot 
tbe<5C d ^ rav % ar> m othct c ■j)'-hx ^' k \ '^tmtiarix ^ IF (; "''^2 \ •\\{ ^ os iu^. u lUs 
, 1 "1 " and " SRAAI nnm *'0*^ am "0" are iH ht A u 1 IJ O "OO nd hi u ' * ar d 

30 I ar<. t c ^55 ot h J l-^F'O, \vh le umi^ • ana "ire ' ic he o Hft^-o ^^.tidunu 
and 719 are the tail of that HhO. iatomiatjon lor hli'i} /OO may foe written mio heaa umts 
705 or 707, as 3shown by arrow 722, and read from tail units 71 1 or 709, as shown by arrow 
725> A particular mtry, however, may be both written to and read frotn head units 705 or 
707, or may be both written to and read from tail units 709 or 71 1, miaanisitig data 
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movemeiu and iaienoy. Simitlarly, iuiomjatjoa for FIFO 702 is lypieally wximn mto head 
amis 713 or 71 5, as shown by mow 733, md read fmm tail units 717 or 719, as showt^ by 
anow 739, but may imtt&d be read &om fb« same head or tat! unit t» wMch it was %¥ritt«n- 

5 The SEAM FIFOS 700 and 702 ar« both connected to DRAM 460, whkli allom 

virtnally unlimited expansion of those FIFOS to handle situations m which the SR*^M head 
md tail are &!l For example a fijnst of the tinrtjf-two queues, labeled Q-z&to, may queue aii 
entrv' in DRAM 460. as showit by arrow 727, by DMA units acting under direcuon of the 
uueoc rnariSiier, iastead ut'bsing queued sr. the head or tat! otT'IFO "00, Entries stored 5n 

10 DRA^^ 4{>U return to SRyVM urn; 709, as shown by arrow '^■^0, extending Jhe length and fall- 
thrcagii lime of that FrFO- Divcrstoti from SRAM to DRAM i.-; typKaiU it>.cr<,cd fo- %vhen 
the SRAM is luli, since DRAM i? slower and DMA tnovcment causo-i- <idd}Uoaftl i^iicniry 
Thus Q-icro itiay compnse the entries stored by queue nuuiager 606 iri both the FIFO 700 
arui she DRAM 460. Likewsse, inibmiation bound for FIFO 702, whsch may correspond to 

15 ':>^^^ i~»f:- -^c^ca, for exitmple. can he moved by DMA into DRAM 460, as shown by arrow 
735 k The capacity for qucuiag in cost •effective albeit slower DRAM 460 is user-definabk 
daring imtiaiizstion, aliowag the queues to change in size as desired. Mortnation queued in 
DRAM 460 is returned to SRAM mii 717, as shown by arrow 737. 

Status for each of the thirty-two hardware queues js convenienUy njaintained in and 

20 accs^ed fiom a set 740 of four, thirty-two bit registers, as shown in FIG, 1 7, in which a 
specific bit in each register coitespotjds to a specific tpsm. The registms are labeled Q- 
Out„Ready 745, Q4n„Ready 750, Q-Eaapty 75S and Q-FuB 760. If a particttlar bit k set in 
the Q-Out_Ready register 750, the queue con"esponding to that bit contains information that 
is ready to be read, while the setting of the saijie bit in tJjc Q4n_.Re3dy 752 register means 

25 that the qtjcue k ready io be wntten, Simihifly, a posin ve sett-ng of a specific bh in the Q- 
Fituply register 755 nieatis that the queue corresponding to that bit is empty, whde a positive 
settitJg of a parttcoiar bit in the Q-Fuil register "60 means thai the queue corresponding to 
that bit >s iidl Thus Q-Out^_Ready 745 contatns bit? ^ero "^46 through thirty-one 74S, 
including bits m-enty-seven 752. twenty-eight 754, twenty- nine 756 and thirty 75 S, Q- 

30 Injleady 750 contain^; f-its zero 762 through thirty-one 764, including hits twenty-seven 766, 
nventy-iiight 76S, tsventy-ntnt; 770 and thirty 772. Q-Empty 755 contains bits :>:ero 774 
through thirty-one 776, including bits twenty-seven 778, twcaiy-eight 7S0, twenty-nine 7S2 
and thirty 784, and Q-full 760 contains bits zero 786 tlnrotigh thirty-one 788, inckiding bns 
twenty-seven 790, twenty-eight 792, twenty-ni»e 794 and thirty 796. 

26 



Q-zero, correspoiKJiag to FIFO 700, k a free buffer qusue, which holds a list of 
ad{3ress«s for all avaitabie buffers. This queae is addressed when tht mkroprocessor or other 
devices need a free buffer address, and so coitaaoniy kclades apptecitable DRAM 460. Thus 
a de^'ice needing a free buffer addrm would cheek with Q-zero to obtain feat address, Q- 
5 tvv<3ity~seveji, c(5)nrespoadmg to FIFO 702, is a receive buSsr descriptor queue. Afier 

processing a received frame by the receive sequencer the sequencsr tooks to store a descriptor 
for the frsiTOe In Q-tweiity-sevm, If a 3ocatioii for such a descriptor is itiHnediateJy available 
in SRAM, bit tvvoRty-sevesi 766 of Q-M^Ready 750 will be set. If not, the sequesicer must 
wait for the queue njanager to iaitiate a DMA move from SRAM to DRAM, thereby fredug 

1 0 space to store the receive desaiptor. 

Operation of the queue nbin;ue'- \^hs.~h rnaria^cs tr.>ivc!neni (-ifc^ueuc «ur!e^ heavccji 
.SRANt and the processor, the transnut and receive seqiic-iccrs, an<J aL^o hctwccn SRAV! and 
DRAM, is show!i i«i moi^ detail in TIG. IZ. Requests ^vh!ch uiih2c :1k- qucu<.',> in.kKk- 
Proce^sor Request SO?,. Transnsit .Sequencer Request 804. and Receive Se<uu>ncej Request 

15 5^06. Other requf^ts for tiic qijcuts ate DR.\M lo SRAM Request SOS and SRAM to ORAM 
Request 810. which op«»ate on behalf of d^e queue manager in mosdng data back md forth 
between tiie DRAM and the SRAM head or tail of the queues. Determining which of these 
various requests will get to use the queue manager in the next cyeie is handled by priority 
logic Arbiter BIS. To enable high frequency operation the queue manager is pipelined, with 

20 Register A SI 8 and Register B 820 providing temporary storage, while Status Register S22 
maintains status until the next update. The queue manner reserves even cycles for DM.A, 
receive and transmit sequencer requests and odd cycles for processor requests. Dual ported 
QRAM S25 stores variables regarding each of the queues, the variables for each queue 
ineludisg a Head Write Pointer, Head Read Pointer, Tail Write Pointer and Tail Read Pointer 

25 corresponding to the queue's SRAM condition, and a Body Write Pointer and Body Read 
Pointer corresponding to the queue's DRAM condition and the queue's sizz. 

Afier Arbiter 815 has selected the next opeiat.on tc b« peitonned tht \' enables of 
ORAM ?2 ; utf iVtvhed and modified according to th<: se^ecteu operation h\ QAl V 
and MJ SR \Ni Rcui Request S'^O oj SRAi*f \^ rttc Ret! uCit $40 '■'e iwr<iied Ihv 

30 %a..abA-s .15 0 updaec aidthe ij> . ' vrs s ojed ii= Sr.'m^ Su-jNtv? S?? <,6 v\dl as, 
QRAM h2?- TIk ^la;u."^ \s also fed ;o Asn .Si to <igr,Al thjjt the opcj At m \Tt -.i tmh 
rcQtiC-itot. ha*i ^ocn fu'fdled, mhibttrng duplication a*" requests llr Status R.'g)'; e S?? 
updates the lour queue registers Q-Out_Rcady 745, Q-ln_Read> 750, 0~Hntpty "55 and 0- 
Fuil 760 to reflect the new stattjs of the queue that was accessed. Similarly updated are 
2? 
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SRAM Addresses §33, Body Wiite Request S35 and Body ReasI Requests which are 
accessed via DMA to and from SRAM head and tai'is for that quern, Altsniatively, vaiioQS 
processes may wish to write to a queae, as showa by Q Write Data 844, whtct) are selected by 
muMplexor 846, and pipeltaed to SRAM Write Request 840. The SRAM cojttrolfer services 
5 the read mi write requests by vvrifeg the tail or readitig the head of the accessed qijeue mxd 
reiuraiag ati acknowledge. In tins manner the various queues are utiiizet! aiid ihck statos 

' FIG'S '^A-r shfw. ;jt iea^f-r.vrrnly.i.v.-t terKtei 9rr) that is t;n-p{o>Cil Ibi chi-.'Smg 
wtnch contexts ot CCBs tn rusn-ani in !"NIC cache mcmon The LMC :n t{«5 crnbodsmoit 

iO v<3r oacnc up to i!i>aeen C'CBs m SR \M at a gi\ en nme, and ?o ^vhcn a new t\'8 i:^ cschcd 
art old OML," rra:sicf\cnbc discarUc\l ih.^ J.s^otdou ( as;ua)iy .^hoscn arc!.rd}«g -o ?hi;- 
tvgijjtcr <?0i - y be the CCB that n... bc.7 abtd Icuil recent'.:, lu thss embodiment, a h^irih tabie 
for up to two hundred fiftv sjx c'CB. j ~ v..i>! nsamtataed SRA.\1, while up to Hvc hundred 
fifty-six fuU C-CBs are held \\\ DRAM The least-recentiy-used register %0 contains sixteen 

15 four-bit blocks iaheted RO-RiS, each of which corresponds i» m SRAM cache «uit, Upoft 
initialkation, the blocks are numbered 0-15, with number 0 arbitrarily stored m tlie btock 
repfesenting tfee least receatly «sed (LRU) cache unit aud number !5 stored in the hiock 
representing the most receotiy used (MRU) cache unit FIG. 1 9A shows the register 900 at 
an arhitrary time whe« the LRU block RO holds the nt^mber 9 and the MRU block R15 holds 

20 the number 6, 

When a different CCB than is cwently being held in SRAM is to be cached, the LRU 
block RO is read, which in FIG, I9A hoids the tsamber 9. and the new CCB is stored in the 
SRAM cache anit corre$|jondia§ to namber 9, Since the new CCB corresponding to autnber 
9 is now the most recently used CCB, tlie nutnber 9 is stored in the MRU block, as shown tn 

25 F:G 1 <>B The other number? are all *:hifted one register block so the left, icax tag tht nu-nbes 
' ir tht LR. block TheCC'FJ ^n." > k p'.vi ^ s^^v -leen^a^heu sn Uk SR.'-VM uint 
vOr:ei.po'iii^rg lo tiunncr ^* {u>^ DvCI no^ tj i.Anst^r but rnoK vCst-CiU'ctj^e DRA.M 

FIG .-iC shows *ho 'cstih whsio CCB usai h^id .ik<.<kh beeti tachcd ;n 

SR A\i Ir t! rs v,xamplc X:c CCB was cachcc ri an SR A.M i nu corrc-po.id Hi' tc num^!,t V ' 

30 iJiJ so a.tei etuplos-ment o: t^kit GCB. number 10 \% istored MRl hSo^. k t !nK tho^c 
{uantacf?i vihtch had prcvioasH been more recently tjscd than number ' 0 (reys&ter biocki> Rv- 
Rl 5) are shitted to the left, leavirtg the aumber 1 in the LRU block. In this mamier the IMC 
mdataitjs the tnost active CCBs m SRAM cache. 
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In some cases a CCB b&m$ used is om ik&t is not deslrabis to hold m the hmited 
cache memorj'. For example, it is preferable not to cache a CCB for a context that is known 
to be closing, so that other cached CCBs can r^ain in SRAIvl longer. In this case, fee 
number represeating the cache mit holding the dscacheable CCB is stored in the LRU block 
5 RO rather than ttie MRU block Rl 5, so that the decacheaMe CCB wiil he leplaced 

immediately employmem of a new CCB tliat j« cached ia the SEAM miii corresponding 
to fee number held in the LRU block RO. FiG. 1 9D shows the case for which number S 
(which had been in Mock R9 in FIO. I9C) corresponds to <i CCB that will foe used and &en 
closed. In this case number 8 has been removed from block R9 and stared in (he LRU block 

10 RO, All the numbers that had previously been stored to the left of block R9 (RI-R8) are then 
shiSed one block w the right. 

FIG- 20 shows some of the logical units employed to operate the least-recentiy-used 
register 9(50. An array of sixteen, thj-ee or four input njultiplexors 910, of which only 
mnhipiexos-s MUXO, MUX?, MUXS» MUX9 and MUXIS are shown for clarity* have outputs 

1 5 fed into the corresponding sixteen blocks of leaist-nseentiy-used register 900, For example, 
the ou^ut of i^EJXO is stored in block RO, the output of MUX? is stored in block R7, etc. 
The value of each of iSie register blocks is connected to m input for its correspotidtng 
multiplexor and also into inputs for both adjacent multiplexors, for use in shifting the block 
numbers. For instance, the number stored in R8 is fed into inputs for MUX?, MUXS and 

20 UUXO and MUX 15 each have only one adjacent block, and the extra input for those 

mtJJtiplexors is used for the selection of LRU and MRU blacks, lespectiveiy, MUXIS is 
shown as a four-h^ut multiplexor, with input 91 5 providing the number stored on RO, 

An array of sixteen comparators 920 each receives the value stored m the 
i orr pond ng block of tlie least rv,ccrt}\ used reg vter 9{!0 l^di .omr 'r-iio > -ci cj e-> . 

25 ii '-or pmcessoj 4*70 4ionu Ure 9'^'^ t^at tl e !c^'<;tc'- hh v.k hums, i -ii,.n Hr^ n^hhi 

t .aMen bv Oiwessm 4"0 0 Jt;!^ ^ s -!. t h u 0\ t ti e ot. tr fii'ecr 

vomp<a:a*on>oi.jpvi* false I o« i. ^ j a f^.r ^cltvi iiL\Ldm„t vCJvf 

t cIru!^3ncx^.^^ tir^t-k t.n^ s\f n u n xo s in ' rSo*-*. » s asn_ tUurL o 
t! c !Ofc><; e b o^k numbers Ihus st,iect .mes tonttoi Mi 'X{* i,eiv. t nscb 9^+4 lotiuol 

30 MUX 7, select lines 949 control MUXS, select lines 9.54 control miX9 and select lines 959 
control MUXIS. 

WIten a CCB is to be used, processor 470 checks to see whether the CCB matches a 
CCB cijrrmtly held in <me of the sixteen cache aaits. If a match is fotmd, the processor sends 
a signal along line 935 witii the block number corresponding to that cache unit, for example 
29 



number 12. Comparators 920 compare tfee ugci&l tliat line 935 with the block mimbe?^ 
and comparator C8 pmvMes a true output for the block RS that t«atcfees the sigaal, whOe all 
the other comparators otUput false. Logic circuits 930, under contml ftom tfee processor 470, 
use select lines 959 to choose the ivpvtt from line 935 for MUXIS, storing tl)s tinmber 12 in 
5 the MRU block R 1 5 , Logic circuits 930 ^iso send sigaais along tlie pairs of select Mnes for 
MUX8 and higher muHipiexors, aside from MIDCI 5, to shift their output one block to the 
left by selecting as bputs to each multiplexor Ml.rXS and higher ihc value that had hea^ 
stored in register blocks one block tc the right (R9-R1 5). The outputs of multiplexors thai are 
to the left of MUXS are tseiecied to be constant. 

10 If processor -70 does> nol find a match for the CCB amojig the sjxtccu cache un\t-x on 

the other hand, the processor re^ui,; from ..Rl block R-) along line %» to identify the cache 
corresponding to the LRU block, cind %<.Tites liie data stored in that cache to DRAM. The 
iiutnbcr th.at vsus stored iii RO. in this case nmnber 3. is chosen by select lines 959 as input 
91 5 to MUXIS tor storage in MRU block RI5, The otlier ilfleen raultiplexors output to their 

15 respective register blocks the numbers tliat had been stored each register block immediately 
to the right 

For the situation in which the processor wishes to remove a CCB from the cache &^&r 
use, tbe LRU block R0 rather thsa tiie MRU block Rl 5 is selected for placement of the 
tiumber corresponding to the cache unit holding that CCB. The number corre^onding to the 

20 CCB to be placed in the LRU block RO for removal from SRAM (for example number I, held 
in block R9) is sest by processor 470 along line 935, which is matched by ootnparator C9. 
The processor insmiets logic circttits 930 to input the number 1 to RO, by selecting with lines 
939 input 935 to MUXO. Sekct lines 954 to MUX9 choose as input Ihe ntrmber held m 
register block RS, so that the tiumber frottt RS is stored in R9, The numbers held by the other 

25 regista blocks i ctvi een RO and R9 are smulferlj shifted to the t »ght. w-htpsas the numbers m 
ieg!!j'e> biocki to ti»c h oI R,9 are lef constant Thi<; ficcs scarce cache moinoiy froi i 
mmnim 1 1 ^ coned L^h i-^^ man «, c'ci ^vhtle tbetr idcnttfviig numbers mc c *\ ougl> 
registe- blocks tiomticMR^st J ►iclR^ b.oA- 

W tK iti, tne dboM ic c^Ov-c d^MvCS anC i^s icm'^ *oi o> data 

30 <.o njtrui.^atJt n res-uh m <. raiT:i.*tc icductn ns m the ti ne i-equirou lor ro^L- ^mj laf^^e 
connecuon-baseJ 'TJvi»>^t/.'s I rotocal processing speed is tremendously accelcrued b> 
specially designed protocol rroeosi tig iwrdv 'are as compared w tb a s^eneral purpose CPU 
rumsmg convetitsonal protocol s*. fhvare, and mtemtpts to the host CPU are also substaatjally 
redaced. These advantages can be provided to an existing host by additioii of ait mtelligent 
30 
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aetwork mtarfece card (INICX or the pmtocol processing hardware may be lategmted wislj 
the CPU. In either ease, the protocol proce^lng hardwm e and CPU inteliigoatiy decMe 
which device processes & gi vea message, mi cm shaagc the allocation of that piocessmg 
based vtpon conditions of the message. 
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1 . A method for comm»aic^bn between a network md & host compater haviag a 
processor md a sequentiaJ stack of protocol layers, the method comprising: 

rec^ving, by said host fem said network, a message packet iactadmg data 
and a plia^lity of headers corresponding to said stack of protocoi laysi-s, said data intended 
for pkcemcEit in a destmatioa of saud host according to protocol processing of said headers, 

processing, as a groti^ and at ojgc time, said piurality of headers, inctoding 
creating a summary of saal group of headers, 

choosing, based lipon said st«um£«-y, whether to process said packet by said 
protocol layers, and 

sending said data to said desttnatiosi according to said siimmary of said group 
of headers, whereby sequeatki processing of said packet by said stack of protocol faycrs is 
avoided. 

2. The method of claim I , wherein said procsssitig of said group of headers occurs 
daritig said receiving, hy said host from said network, of said message packet. 

3. The ntethod of claim I, furtlver comprising creating a communication control btock 
for a connection inciuding said packet, and matching 3iaid summary with said communication 
control block, for sending said data to said destination, 

4. The method of claim U further con^rising creatirtg a commimicstion controi biock 
for a connection bcl«di«g said packet, wherein sending said data to said destination includes 
guiding said data by said commtjmoation control block. 

5. The medK)d of claim 8, farther comprising tnmsmitting a second message packet Irom 
said itost to said netwoiic by referesacing said communication control block. 
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6. A mefeod for processing conmunication betweea a network md a hmt having a 
seqaenlsal laotocoi processing stack, She method compming: 

providffig a device mcludsfig a communicsSios processor, sakl de\'ice being 
cotmccted to said host aad said neiworfc, 

receiving amessiage Same from said tictwotk by saM host, said 
irsckding data and a series of headers cortespotsdiag to said seqaentiaf protocol processing 
stack, 

airil yzing said seties of headers as a stream of bytes by said device, includtag 
processing said headers wi^out copying said data, thereby creating a summary of said frajne, 
md 

selecting, l>as<jd upon said processing, whether to process said packet by said 
stack or to send said data to a desunatioi'. accordiag to said saaa«ar>'. 

7> The method of ckim 6, further comprising; 

creating, by said host, a eoiumunication control biock for a mess;^e iscl«di«g 

s&id 

storing said coiamttnicatson contro} biock in said device, 

guiding said data to a destination denoted by said communication control 

block. 

8 , The method of claiin 7, farther comprising comparing said s«rr»Dary witii said 
communication control block, prior to guiding said data to said destiriation. 

9, The method of clairjjd, further comprising: 

transmitting, via said device, transmission data ftxjm said host to said network, 
including sinrsultaneousfy piepending several protocol headers to said transmission data for 
network transfer to a remote host. 
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1 0- A method for cona«tifticaiion between a network aiid a host computer having a 
processor and a sequenti al staek of protocol layers, the method comprising: 

receivmg, by host from said uei-work, a message havtng muUspte p^ets, 
each of sajd packets mctadmg a data portion ard an associated sequence of headers wliich 
include t itotmaoon coixe-^pondin^ to satd ^e^jcir al stack of protocol layers a«d indicate an 
Lpper 1 1> e. dcstmation m said tost fot said cit£*i. a>ta 

setKii.tg a p'ur,iht\ of s^.d >% n-j pomons tu said S <;(tnauoa wiihout satd 
a<>»o,.iawd itai2i.'\ a h1 ^vi*hov.t ^t,nvi!at.n^ < n mtcmipt lo aw ao'.t CPU 

I i The niti'h^J oi t I u, fiuther compmrng choosing \\het!ier to process said packets 
by said stack of protocoi layers, prior to sending said data portions to said destinstioa, 

i 2, The meihod of claim 1 0, iwther compristjtg providing to said host a protocol 
processing device, md 

sujnmarisitig said headers with said device, prior to smdmg said dala portions 
to said destination without said headei«, 

13, The method of claim !0, further comprising trmmittitig a data file to said hosl to 
said network, ineiuding dividing said data file into a series of data units, prependiag headers 
to said data units and thereby creating a series of network lirames, and placing said network 
ftames on said network without generating an interrupt to any host CPU 
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14. A method for cofflmamcation beiwees a host computer and a netvv^oi*;, the host 
computer having a CPU, a storage ai5it md a sequential stack of protocol layers, the jnethod 
comprising: 

providing a device coiineeted to said network md md hast, said device havliig 

a processor, 

receiving by said device a fjrsi tnessage from said network, 

pioces-sirig ^aid first message, including creating a commtmicatiott control 
block for said urst message, 

rtcening by said device a second message from stud network, ^asd second 
jnessiige including d&ia ;ind a header, said header including a scrjcs x^t pr.itoooi layer headers, 

processing said header by said device, including generating a summary of said 
he-adcr, without copying said data during said processing of said header, and 

sending said data by said device to an upper layer of said protocol layers in a 
form suitable for said upper layer, including guiding said sending with said eommwiication 
cc«itrol block. 

!5. The method of ciaim 14, farther comprising reccmsig by said device a third message 
miatiftg to said first and second messages, ajid passing said communication control block 
SfOt« said devjce to said storage um(, thereby passing control of procssstag said third message 
to said CPU 

i 6. The method of claim 1 4, ftmher comprising matching said summary' vMx satd 
communication control block, prior to sending said data to said upper layer. 

17 fb,? metnod ofcuun ' 4, .urther Cvi-nprt'^ing iran;..nruag f) un -^aid host to said 
ncmorl^. l ih-id mu«^a^:e, nclucMU sera a . K ^v. \ '^..td devsce by 'ofacncmg 

said communication conttol block and prcpesidnig ,> ir-^usjniss on header to data acvjuircd 
from a host source, said transmission hojder including a pUirality of protocol layer headers. 
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18, A m^od for communjcatioa bctvve«a a local host and a r^ote host coRnected by a 
network, witb the local host imviiig a protocol processmg stack aad an assosiMed protocol 
processing device, the metliod comprisiag'. 

cmtmg, by the protocol processmg stack, a comnsamcat ioB control block 
defming a connection between the local host md the remote host, 

passing said commrnitcation control block to tlie device, ssd timeby psssmg 
eontm! of processing a message packet associated with said cotmectios md tnmsferred 
betw^n the network and the local host, such that said packet is processed by tiis device 
Instead of by the pmtocol processing stack. 

19, The meihod of ciami 1 S, &rthei corapris'mg passnig said conmumication c«atroi 
block back lo ihc locid host, such that a second message packet traissfarred belween the 
network and the lacs.) iwst ;ind associated with said comection is getieraUy processed by the 

pmtocol processing stack. 

20, The method of ciatm IS, fortber comprisiRg; 

r^ceivisg, by the device, a message frame from the network, md 
summarizing, by the device, said message frame, thereby generatltig a 
sutmnaty of said message frame, and 

comparing said sununaiy with said communication control block. 

21, The metlK>d of claim 18, further comprising: 

transmitting, by the device^ a mssssage &ame to the network, including fonmng 
a header based opon said communication control block and prepending said header to mid 
message frame- 
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22 . A m^od for network cojnmiiTiiestion by a best eoiupater having a processor, a 
mortjoiy and a sequential sJack of protocol kyers, the method cttmprisiag: 

recetviisg by the host from ihs isetwotk a packet mcludmg data a plamltty 
of headers felatsng to tiie stack of imjtocol kyets, said data having a destinatiots in ssid host, 

categorizjtig said packet with a hardware logic seqtisntcer, fecludmg 
classifying said headers and creating a summary of said packet, and 

choosing, bas«d upon said summary, whether to send said packet to said stack 
of protocol layers or to bypass said stack of i»otocoi layers by sending said data to said 
destination. 

23. The Rielljod of ciahn 22, funhcr comprising: 

sending said packei io said slack of protocol layers^ 

processing said packet with said stack of protocol layers and thereby creating a 
context for said sncssage, 

receiving by said host from said network a related packet mciudiag additional 
da^ and additional headers, aM 

eujploying said context for sending said related packet to said destination 
without procssstng said packet by said stack of protocol layers. 

24. The method of claim 22, further comprising creatiitg a context for a message 
iackding said packet, said context defining a coraiection between said host and a remote 
host, wherein choosi?tg whether to setid said packet to said stack of protocol layers or to 
bypass said stack of protocol layers includes con^aring said summary witb said context 



25. The method of claim 22, further comprising bypassing said stack of protocol layers by 
sending said data to said destination in a fona suiteble for said destiaation. 
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26, A device for «s« with a iocal host tiiat is coaaectaMe to a rettiote host via a network* 
ilte local host coctaimng a CPU operating a stack of protocol ptocessing Uy&ts that define a 
coaaeetba context between an application of U^e iocai hosi and an application of the remote 

host, the device comprising: 

^lem-^ji^ .'inJco'naiD'V' 1 iK-vccvsoT^i ifi,!. f ' cnoosuu S\ rctt-rcnuny die context, 
^^hc;^cr to procces ? ucti.'^uik r^o^^ u t ''st ^ o o> o w ,^ i g iavcts lo a% jjd the 
protocol processing layers atid cmp5o> Jic coiUcxi toi tumsLning data contajncd m sasd 
message between the network and the iocai host apphc«.tion> 

27, The device of ciaini 2^, \,hereir ^atd coTsnuucation proci-ssmg luechanism mcludes 
a receive seqtjencer coaaected to said processor and configured t^r vaitdating a messs^e 
packet received from the network, a«d said message packet contains control infomtation 
corresponding to the stack of protocol layers, 

28, The device of claim 26, wherein said communicatjon processing mechanism includes 
a receive se<3uencer connected to said processor and configured for generating a summary of 
a message packet received j&om tite network, said message packet containing control 
i«fi)rmatJO« corresponding to the stack of protocol layers, with said processor adapted for 
comparing said sununary with said context. 

2^5 The dc^ .ce ofciiin 2A whervui "atuproces<:e! - rdapted «"<rc5ertmg?hv^der 
cojiXNpo K n!t> Jo sasvJ cu.^ e\l atx. r <,hid'"j von^K*! nuomutJOi correspond ng lo !»c\e!ai of' 
the pK tf coi pr te>>> >g ct\ .nd }.<jt.|Hn-'i' g said header to said data for transmission of 

d KXN ^t, S nn t < s! K) ^ f e', i h I o^ 

30 The c^.'itcc ot vlajm Zo ^sncivsn ^^uu ».o rimumi.aion pioct,s,^ing mv^dum^n hasa 
dnect memory access unjt for setidmL, ■>t.^^.d .pt.ni.«jd lontcV b>'-id data ir^'sn "aid 
communication processing mechanism to the hos* appbcation, v, (tbout a ncadcr 
accompanying said data. 
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31 . The device of clajt« 27, \\'hm:&m said processor isclmies a pltiralsty of 
micmprocessots, wHft at least one of said «jjcit>pfOcessca« p«i«ariiy adaptisd for processing 
messages received fay the host fmm the network md a second of said tnicroproc<:^>rs 
adapted for processiag messages tra»siwtted fiom the host to tiio network- 

32. The device of claim 31 , wherein sam microprocessoi^ utilise shared hardware 
functions in rotating phases, 

33. The device of claim 26, wherein said commimication proeessiag mscharhsm iocludes 
a queue tnatjager configured for <|«etiiag mfomiatton in a pItiraMty of queue stonjge uuiis, 
wherein at least one of said queue storage units contains SRAM aiid DRAM. 

34. A communication device for a host computer cormectable to a network, ths host 
cotttputer having a CPU with a stack of protocol processing iayers operable by the CPU for 
processing network cotrtmunications and deft»i»g a connection context between a destination 
in the host and a source in « remote host, the device comprising; 

a receive sequencer connected to the network and configured for vaildadng a 
message packet received from the tjetwodc by the device, said tnessage packet containing 
data and a header with controi iafotstnation regarding several of said protocol layers, said 
receive sequencer adapted for creadiig a sunnnary of said packet for storage in the device, 
and 

a communication processor connected to said receive sequencer and to the 
host, said communication processor adapted for comparing said suinnrary with the 
communication controi block and choosing, based upon said comparing, whether to send md 
message packet to tbe stack for profocoi processing or to send said data directly to the 
destination without processing said message packet by the stack. 

35 The device of clajm ?4, wherein saui commaiucution proctrsjaor contaujs apluraliiy of 
pipelined r;icroproces.sors operating m rotaung phijses. with ai leas; oae of said 
micioproeei!.<5vr-< co:nlgured for processing messages* leccr- cu by tJie liosl from th;-^ nrnvcirk 
and at least one of said microprocessors conllgured for pr^x-cssmg messages transmjtted from 
the host to the network. 
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36, The device of claim 9, further coai|>ming a qumQ maaager costatnmg logic cii-cjii^s 
coaSgtited for <|ueumg Momiatitm for said commusiicalion processor aad said receive 
$e<iu«ncer m a piaraiity of storage queues. 

37, The device of claim 36, wherda said queues contain both SRAM and DEAM, 

38, The device of claim 34, ftjrther comprismg a cache memosy sjkI a plurality of 
cosEiaectJon contexEs, with a hasli of each said coBnection comext being stored in sai d cesche 
memory for compatisofi with said ^mmary> 

39, Th« device o f claim 3S, further comprising a least recently used register for 
detemiimng which of said ct^mection co»texCs are stored in said cache memory. 

40, The device of claim 34, wherein said receive sequencer has a sequence of hardware 
logic mits for processittg a header contained in said message packet and having control 
mformatfoii substantjaily corre^oading to said stack of protocol processing layers. 

4! . llie device of claim 34, wherein said commustcation processor conariatids a direct 
memory access coatmller for sending, based upon said context, said data to the host 
de«ttna£ioR witijout said header. 

42, The device of ckim 34, wheretn said coaiinuatcatiar} processor is configursd for 
transmitdng a second message and thereby tensferring transmit data from the host 
destinatiort to the network, jnclndmg prcpeading a header derived from said context to said 
traiismit data. 
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43, A device for conamuiiicatjon between a tecai host with a CPU »nd a remote host, the 
hosts connected by a network, ttie device coinprismg* 

a conamuntcation pjx)cc&$ing m<xhan}<;ro ronnect&t to the nctv^-ork and to the 
local host s>ajd ntcchantsm tncliidtng hardv^ate logfc for ■> rt.cssii ^ a lUu p^'tkct and 
geitemtuig a summan' oi said packet, 

a protocol prf-Lcssir^; stack dnpos<,d in 1\k .ocul lu^f and "tpcrabi*. K 
foi cicatirg a coirn ma^ *ion v n"--. ^ H(^^i- n u p. s - „ m d c imnwscation control blo-k to 
sam mt<hitiis'n v^<"h sdiU ci.'mmunivdi en >oi^tro' '■>lo(.iv t'i " mug a coniiectjon he.necn ihc 
local host and the remote host, 

Aherem saso mechan!M« and <:aid proaxol '>-0v(,'>sin5i ^uck are anarged ^uch 
that a message corresporKimij o satd v onncctton and transferrtd bti.w<^ej> sad na\"^ork atjd 
said local host is processed by said meehauism instead of by the CPU wheii said meehanisin 
is Mdmg said cotmnuatcation control block. 

44, The device of claim 43, wherein said mechanism has a plurshly of network 
connections, 

45, A device fer processiag coisamtiaication between a netwott and a host having a stack 
of protocol layers, said device compnsing: 

a plurality of logic units for categorizing a message packet received from the 
network, said packet including data and a header and flowing through said logic taiits as a 
Stvsrni of bits, with s-aid logic uniu> creating a .surnnuirv uf said packei from said stream, 
a memoiy for storiiig said packtst aiu5 said sunnnaiy, a-id 
a mictx:>pr(,x;essor for friaichsiu said j;ufnmar>' v.-it}i a coancct-.on coracxU and 
for moving said packet %vitho«t said protocol infomiatson to a desdnaiion sn the hosi indicated 
by said context, 

46, Tiie device of claim 4.*?, whcTcin -.ia.d micruprocessor •nclud<iS a piuraitt^ of pipcSmed 
processors, with one of said processors coiulgured for traitssnitting network mes-sage:!; and 
another of said procmots conSgured for receiving network messages. 

47, 'Hie device of claim 45^ further cons^rislng a CPU operating a stack of protocol 
processing layers t>r processing a s^ond packet having a second summary not tnatching said 
connection context, 
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48, A device for traiisferring a message b^een a network and a host eonjputer, the 
message inciodmg a series of headers corresponding to a ssq^enee of protocol layers, Qje 
device comprismg a series of ssj^ueneers aliped for proeessing the series of lieate as a 
slmm of bits md geneiatisg a sMtts of said message, with at least one of said scqwencers 
iacludjng logic for categorizing a header eorre^oiiding to an apper layer of saJd protocol 
layers, 

49, The device of claim 48, further comprising a cotnmttmcatlon processor conitected to 
said sequencers aiid capable of employiiig said status for sendiBg said tnesisage witlwat the 
headers to a dcstmattofi of said host. 

50, A device for transmitting ntessages beiween a network and a host the device 
compising; 

art array of variable ierjgtii FiFO circuits defining a pSurslity of ^^ueues, 
a receive sequerxer conftgyred for generating a status of a frame of the 
messages received from the network atid storing said status m at least one of said queues, and 

a protocol processor containing a plttraJtty of pipeitned microprocessors 
opeasttbg a set of logical units m aitematiag phases, with a ftrst of said microprocessors 
adapted for processing the messages received from the networfc and a sscotid of said 
microprocessors adapied for processing the messages transmitted to the network, said 
processor emptoyiug a communication control block in concert with said sequeucer for 
accelerating transfer of the messages. 

5 1 s The device of claim SO, wherein at least one of saitd queues incindes a ORAM and an 
SRAM storage umt. 
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52. A device for comnujniealiojj between a network md a bo$t computer having a 
priicessor md a sequeMsal stack of protocol Myers, the devic e eomptis jiig: 

meaijs for recesvmg, by said host fiom said aeiwork, a message packet 
taciuding d»ts md a piur&iity of hcadm oor?«spoiiding to said stack of protocol layers, said 
data imesided for placemeat in a destination of said host aceord iag to protocol pisjcsssing of 
said headers, 

means fer processmg said plurality of headers, intsJuding cieatiag a satamary 

of said groitp of beaders, snd 

measis for sendii^g said data to said destination acasrding to said s«mmaj>' of 
said group of headers, whereby sequential processiag of said packet by said stack of protocol 
layers is avoided. 
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